[ch23 work John Goerzen **20080304060236] { hunk ./en/ch23-webclient.xml 123 - + + The Parser + + Now that we have the database component, we need to have code to + parse the podcast feeds. These are XML files that contain + various information. Here's an example XML file to show you + what they look like: + + + + + + Haskell Radio + http://www.example.com/radio/ + Description of this podcast + + Episode 2: Lambdas + http://www.example.com/radio/lambdas + + + + Episode 1: Parsec + http://www.example.com/radio/parsec + + + + +]]> + + + Out of these files, we are mainly interested in two things: the + podcast title and the enclosure URLs. We use the HaXml toolkit + to parse the XML file. Here's the source code for this component: + + &PodParser.hs:all; + + Let's look at this code. First, we declare two types: + Item and Feed. We will be + transforming the XML document into a Feed, + which then contains items. We also provide a function to + convert an Item into an + Episode as defined in + PodTypes.hs. + + + Next, it os on to parsing. The parse + function takes a &String; represeting the XML content as well as + a name, and returns a Feed. + + + HaXml is designed as a "filter" converting data of one type to + another. It can be a simple straightforward conversion of XML + to XML, or of XML to Haskell data, or of Haskell data to XML. + HaXml has a data type called CFilter, which + is defined like this: + + +type CFilter = Content -> [Content] + + + That is, a CFilter takes a fragment of an XML + document and returns 0 or more fragments. A + CFilter might be asked to file all children + of a specified tag, all tags with a certain name, the literal + text contained within a part of an XML document, or any of a + number of other things. There is also an operator + (/>) that chains CFilter + functions together. All of the data that we're interested in + occurs within the <channel> tag, so + first we want to get at that. We define a simple + CFilter: + + +channel = tag "rss" /> tag "channel" + + + When we pass a document to channel, it will + search the top level for the tag named rss. + Then, within that, it will look for the + channel tag. + + + The rest of the program follows this basic approach. + txt extracts the literal text from a tag, and + by using CFilter functions, we can get at any + part of the document. + + addfile ./examples/ch23/PodDownload.hs { hunk ./examples/ch23/PodDownload.hs 1 +{-- snippet all --} +-- ch23/PodDownload.hs hunk ./examples/ch23/PodDownload.hs 4 +module PodDownload where + +{-- /snippet all --} } }