[Fold ch18a back into ch18 Bryan O'Sullivan **20080219011606] { move ./examples/ch18a/ApplicativeParsec.hs ./examples/ch18/ApplicativeParsec.hs move ./examples/ch18a/FormApp.hs ./examples/ch18/FormApp.hs move ./examples/ch18a/FormParse.hs ./examples/ch18/FormParse.hs move ./examples/ch18a/JSONParsec.hs ./examples/ch18/JSONParsec.hs move ./examples/ch18a/formApp.ghci ./examples/ch18/formApp.ghci move ./examples/ch18a/formParse.ghci ./examples/ch18/formParse.ghci hunk ./en/ch18a-applicative.xml 1 - - - - Further adventures in parsing - - In this chapter, we're going to spend a little more time on - parsing. Our goal is to develop a few useful parsers. We'll - write one parser for HTTP requests, which will help us to develop - a web server in later chapters. We'll complete our treatment of - JSON by writing a parser for JSON data, which again we'll be using - later, too. - - Along the way, we'll dwell on a few easy topics in - mathematics. As usual, we have a practical goal in mind: the - concepts we'll introduce will help us to write cleaner, more - regular code. - - - Parsing an URL-encoded query string - - When we introduced - application/x-www-form-urlencoded text in , we mentioned that we'd write a - parser for these strings. We can quickly and easily do this - using Parsec. - - Each key-value pair is separated by the - & character. - - &FormParse.hs:p_query; - - Notice that in the type signature, we're using - Maybe to represent a value: the HTTP specification - is unclear about whether a key must have an - associated value, and we'd like to be able to distinguish - between no value and empty - value. - - &FormParse.hs:p_pair; - - The many1 function is similar to - many: it applies its parser repeatedly, - returning a list of their results. While - many will succeed and return an empty list - if its parser never succeeds, many1 will - fail if its parser never succeeds, and will otherwise return a - list of at least one element. - - The optionMaybe function modifies the - behaviour of a parser. If the parser fails, - optionMaybe doesn't fail: it returns - Nothing. Otherwise, it wraps the parser's - successful result with Just. This gives us the - ability to distinguish between no value and - empty value, as we mentioned above. - - Individual characters can be encoded in one of several - ways. - - &FormParse.hs:p_char; - - Some characters can be represented literally. Spaces are - treated specially, using a + character. Other - characters must be encoded as a % character - followed by two hexadecimal digits. The Numeric - module's readHex parses a hex string as - a number. - - &formParse.ghci:test; - - As appealing and readable as this parser is, we can profit - from stepping back and taking another look at some of our - building blocks. - - - - Supplanting regular expressions for casual parsing - - In many popular languages, people tend to put regular - expressions to work for casual parsing. They're - notoriously tricky for this purpose: hard to write, difficult to - debug, nearly incomprehensible after a few months of neglect, - and no error messages on failure. - - If we can write compact Parsec parsers, we'll gain in - readability, expressiveness, and error reporting. Our parsers - won't be as short as regular expressions, but they'll be close - enough to negate much of the temptation of regexps. - - - - Parsing without variables - - A few of our parsers above use &do; notation and bind the - result of an intermediate parse to a variable, for later use. - One such function is p_pair. - - &FormParse.hs:p_pair.noid; - - We can get rid of the need for explicit variables by using - the liftM2 combinator from - Control.Monad. - - &FormParse.hs:p_pair_app1; - - This parser has exactly the same type and behaviour as - p_pair, but it's one line long. Instead of - writing our parser in a procedural style, we've - simply switched to a programming style that emphasises that - we're applying parsers and - combining their results. - - We can take this applicative style of writing a parser much - further. In most cases, the extra compactness that we will gain - will not come at any cost in readability, - beyond the initial effort of coming to grips with the - idea. - - - - Applicative functors for parsing - - The standard Haskell libraries include a module named - Control.Applicative, which we already encountered - in . This module defines a - typeclass named Applicative, which represents an - applicative functor. This is a little bit - more structured than a functor, but a little bit less than a - monad. It also defines Alternative, which is - similar to MonadPlus - - As usual, we think that the best way to introduce - applicative functors is by putting them to work. In theory, - every monad is an applicative functor, but not every applicative - functor is a monad. Because applicative functors were added to - the standard Haskell libraries long after monads, we often don't - get an Applicative instance for free: frequently, - we have to declare the monad we're using to be - Applicative or Alternative. - - To do this for Parsec, we'll write a small - module that we can import instead of the normal - Parsec module. - - &ApplicativeParsec.hs:ApplicativeParsec; - - For convenience, our module's export section exports all the - names we imported from both the Applicative and - Parsec modules. - - - - Applicative parsing by example - - We'll begin by rewriting our existing form parser from the - bottom up, beginning with p_hex, which - parses a hexadecimal escape sequence. - - &FormApp.hs:p_hex; - - Here's our applicative version. - - &FormApp.hs:a_hex; - - Although the individual parsers are mostly untouched, the - combinators that we're gluing them together with have changed. - The only familiar one is (<$>), which - we already know is a synonym for - fmap. - - From our definition of Applicative, we know - that (<*>) is - ap. - - The remaining unfamiliar combinator is - (*>), which applies its first argument, - throws away its result, then applies the second and returns its - result. In other words, it's similar to - (>>). - - - A handy tip about angle brackets - - Before we continue, here's a useful aid for remembering - what all the angle brackets are for in the combinators from - Control.Applicative: if there's an angle bracket - pointing to some side, the result from that side should be - used. - - For example, (*>) returns the - result on its right; (<*>) returns - results from both sides; and (<*), - which we have not yet seen, returns the result on its - left. - - - Although the concepts here should mostly be familiar from - our earlier coverage of functors and monads, we'll walk through - this function to explain what's happening. First, to get a grip - on our types, we'll hoist hexify to the top - level and give it a signature. - - &FormApp.hs:hexify; - - Parsec's hexDigit parser parses a - single hexadecimal digit. - - &formApp.ghci:hexDigit; - - Therefore, char '%' *> hexDigit has the same - type, since (*>) returns the result on - its right. (Remember that CharParser is nothing - more than a synonym for GenParser Char.) - - &formApp.ghci:char; - - The expression hexify <%$> (char '%' *> - hexDigit) is a parser that matches a % - character followed by hex digit, and whose result is a - function. - - &formApp.ghci:func; - - Finally, (<*>) applies the parser - on its left, then the parser on its right, and applies the - function that's the result of the left parse to the value that's - the result of the right. - - If you've been able to follow this, then you understand the - (<*>) and ap - combinators: (<*>) is plain old - ($) lifted to applicative functors, and - ap the same thing lifted to monads. - - &formApp.ghci:ap; - - Next, we'll consider the p_char - parser. - - &FormApp.hs:p_char; - - This remains almost the same in an applicative style, save - for one piece of convenient notation. - - &FormApp.hs:a_char; - - Here, the (<$) combinator uses the - value on the left if the parser on the right succeeds. - - Finally, the equivalent of p_pair_app1 is - almost identical. - - &FormParse.hs:p_pair_app1.noid; - - All we've changed is the combinator we use for lifting: the - liftA functions act in the same ways as - their liftM cousins. - - &FormApp.hs:a_pair; - - - - Parsing JSON data - - To give ourselves a better feel for parsing with applicative - functors, and to explore a few more corners of Parsec, we'll - write a JSON parser that follows the definition in RFC - 4627. - - Insert a backreference to the introduction to JSON, - whenever that shows up. - - At the top level, a JSON value must be either an object or - an array. - - &JSONParsec.hs:p_text; - - These are structurally similar, with an opening character, - following by one or more items separated by commas, followed by - a closing characters. We capture this similarity by writing a - small helper function. - - &JSONParsec.hs:p_series; - - Here, we finally have a use for the - (<*) combinator that we introduced - earlier. We use it to skip over any white space that might - follow certain tokens. With this p_series - function, parsing an array is simple. - - &JSONParsec.hs:p_array; - - Dealing with a JSON object is hardly more complicated, - requiring just a little more effort to product a name/value pair - for each of the object's fields. - - &JSONParsec.hs:p_object; - - Parsing an individual value is a matter of calling an - existing parser, then wrapping its result with the appropriate - JValue constructor. - - &JSONParsec.hs:p_value; - - This leads us to the two most interesting parsers, for - numbers and strings. We'll deal with numbers first, since - they're simpler. - - &JSONParsec.hs:p_number; - - Our trick here is to take advantage of Haskell's standard - number parsing library functions, which are defined in the - Numeric module. The readFloat - function reads an unsigned floating point number, and - readSigned takes a parser for an unsigned - number and turns it into a parser for possibly signed - numbers. - - Since these functions know nothing about Parsec, we have to - work with them specially. Parsec's - getInput function gives us direct access to - Parsec's unconsumed input stream. If readSigned - readFloat succeeds, it returns both the parsed number - and the rest of the unparsed input. We then use - setInput to give this back to Parsec as its - new unconsumed input stream. - - Parsing a string isn't difficult, merely detailed. - - &JSONParsec.hs:p_string; - - We can parse and unescape an escape sequence with the help - of a generic function for combining - Alternatives. - - &JSONParsec.hs:p_escape; - - (We're basing the name asum on the - msum function in - Control.Monad, which serves the same purpose for - MonadPlus instances. It's a little surprising that - asum isn't already defined for us.) - - Finally, JSON lets us encode a Unicode character in - a string as \u followed by four - hexadecimal digits. Because the number of values that can be - represented in 32 bits is much larger than the number of valid - Unicode code points, we must check to ensure that we're decoding - a value within the valid Unicode range, or else we could cause - ourselves a runtime error. - - &JSONParsec.hs:p_unicode; - - The only piece of functionality that applicative - functors are missing compared to monads is the ability to bind a - value to a variable, which we need here in order to be able to - validate the value we're trying to decode. - - This is the one place in our parser that we've needed to - use a monadic function. This pattern extends to more - complicated parsers, too: only infrequently do we need the extra - bit of power that monads offer. - - As we write this book, applicative functors are still quite - new to Haskell, and people are only beginning to explore the - possible uses for them beyond the realm of parsing. - - - - rmfile ./en/ch18a-applicative.xml hunk ./en/00book.xml 29 - hunk ./en/00book.xml 136 - &ch18a; hunk ./en/ch18-parsec.xml 485 + + + + Parsing an URL-encoded query string + + When we introduced + application/x-www-form-urlencoded text in , we mentioned that we'd write a + parser for these strings. We can quickly and easily do this + using Parsec. + + Each key-value pair is separated by the + & character. + + &FormParse.hs:p_query; + + Notice that in the type signature, we're using + Maybe to represent a value: the HTTP specification + is unclear about whether a key must have an + associated value, and we'd like to be able to distinguish + between no value and empty + value. + + &FormParse.hs:p_pair; + + The many1 function is similar to + many: it applies its parser repeatedly, + returning a list of their results. While + many will succeed and return an empty list + if its parser never succeeds, many1 will + fail if its parser never succeeds, and will otherwise return a + list of at least one element. + + The optionMaybe function modifies the + behaviour of a parser. If the parser fails, + optionMaybe doesn't fail: it returns + Nothing. Otherwise, it wraps the parser's + successful result with Just. This gives us the + ability to distinguish between no value and + empty value, as we mentioned above. + + Individual characters can be encoded in one of several + ways. + + &FormParse.hs:p_char; + + Some characters can be represented literally. Spaces are + treated specially, using a + character. Other + characters must be encoded as a % character + followed by two hexadecimal digits. The Numeric + module's readHex parses a hex string as + a number. + + &formParse.ghci:test; + + As appealing and readable as this parser is, we can profit + from stepping back and taking another look at some of our + building blocks. + + + + Supplanting regular expressions for casual parsing + + In many popular languages, people tend to put regular + expressions to work for casual parsing. They're + notoriously tricky for this purpose: hard to write, difficult to + debug, nearly incomprehensible after a few months of neglect, + and no error messages on failure. + + If we can write compact Parsec parsers, we'll gain in + readability, expressiveness, and error reporting. Our parsers + won't be as short as regular expressions, but they'll be close + enough to negate much of the temptation of regexps. + + + + Parsing without variables + + A few of our parsers above use &do; notation and bind the + result of an intermediate parse to a variable, for later use. + One such function is p_pair. + + &FormParse.hs:p_pair.noid; + + We can get rid of the need for explicit variables by using + the liftM2 combinator from + Control.Monad. + + &FormParse.hs:p_pair_app1; + + This parser has exactly the same type and behaviour as + p_pair, but it's one line long. Instead of + writing our parser in a procedural style, we've + simply switched to a programming style that emphasises that + we're applying parsers and + combining their results. + + We can take this applicative style of writing a parser much + further. In most cases, the extra compactness that we will gain + will not come at any cost in readability, + beyond the initial effort of coming to grips with the + idea. + + + + Applicative functors for parsing + + The standard Haskell libraries include a module named + Control.Applicative, which we already encountered + in . This module defines a + typeclass named Applicative, which represents an + applicative functor. This is a little bit + more structured than a functor, but a little bit less than a + monad. It also defines Alternative, which is + similar to MonadPlus + + As usual, we think that the best way to introduce + applicative functors is by putting them to work. In theory, + every monad is an applicative functor, but not every applicative + functor is a monad. Because applicative functors were added to + the standard Haskell libraries long after monads, we often don't + get an Applicative instance for free: frequently, + we have to declare the monad we're using to be + Applicative or Alternative. + + To do this for Parsec, we'll write a small + module that we can import instead of the normal + Parsec module. + + &ApplicativeParsec.hs:ApplicativeParsec; + + For convenience, our module's export section exports all the + names we imported from both the Applicative and + Parsec modules. + + + + Applicative parsing by example + + We'll begin by rewriting our existing form parser from the + bottom up, beginning with p_hex, which + parses a hexadecimal escape sequence. + + &FormApp.hs:p_hex; + + Here's our applicative version. + + &FormApp.hs:a_hex; + + Although the individual parsers are mostly untouched, the + combinators that we're gluing them together with have changed. + The only familiar one is (<$>), which + we already know is a synonym for + fmap. + + From our definition of Applicative, we know + that (<*>) is + ap. + + The remaining unfamiliar combinator is + (*>), which applies its first argument, + throws away its result, then applies the second and returns its + result. In other words, it's similar to + (>>). + + + A handy tip about angle brackets + + Before we continue, here's a useful aid for remembering + what all the angle brackets are for in the combinators from + Control.Applicative: if there's an angle bracket + pointing to some side, the result from that side should be + used. + + For example, (*>) returns the + result on its right; (<*>) returns + results from both sides; and (<*), + which we have not yet seen, returns the result on its + left. + + + Although the concepts here should mostly be familiar from + our earlier coverage of functors and monads, we'll walk through + this function to explain what's happening. First, to get a grip + on our types, we'll hoist hexify to the top + level and give it a signature. + + &FormApp.hs:hexify; + + Parsec's hexDigit parser parses a + single hexadecimal digit. + + &formApp.ghci:hexDigit; + + Therefore, char '%' *> hexDigit has the same + type, since (*>) returns the result on + its right. (Remember that CharParser is nothing + more than a synonym for GenParser Char.) + + &formApp.ghci:char; + + The expression hexify <%$> (char '%' *> + hexDigit) is a parser that matches a % + character followed by hex digit, and whose result is a + function. + + &formApp.ghci:func; + + Finally, (<*>) applies the parser + on its left, then the parser on its right, and applies the + function that's the result of the left parse to the value that's + the result of the right. + + If you've been able to follow this, then you understand the + (<*>) and ap + combinators: (<*>) is plain old + ($) lifted to applicative functors, and + ap the same thing lifted to monads. + + &formApp.ghci:ap; + + Next, we'll consider the p_char + parser. + + &FormApp.hs:p_char; + + This remains almost the same in an applicative style, save + for one piece of convenient notation. + + &FormApp.hs:a_char; + + Here, the (<$) combinator uses the + value on the left if the parser on the right succeeds. + + Finally, the equivalent of p_pair_app1 is + almost identical. + + &FormParse.hs:p_pair_app1.noid; + + All we've changed is the combinator we use for lifting: the + liftA functions act in the same ways as + their liftM cousins. + + &FormApp.hs:a_pair; + + + + Parsing JSON data + + To give ourselves a better feel for parsing with applicative + functors, and to explore a few more corners of Parsec, we'll + write a JSON parser that follows the definition in RFC + 4627. + + Insert a backreference to the introduction to JSON, + whenever that shows up. + + At the top level, a JSON value must be either an object or + an array. + + &JSONParsec.hs:p_text; + + These are structurally similar, with an opening character, + following by one or more items separated by commas, followed by + a closing characters. We capture this similarity by writing a + small helper function. + + &JSONParsec.hs:p_series; + + Here, we finally have a use for the + (<*) combinator that we introduced + earlier. We use it to skip over any white space that might + follow certain tokens. With this p_series + function, parsing an array is simple. + + &JSONParsec.hs:p_array; + + Dealing with a JSON object is hardly more complicated, + requiring just a little more effort to product a name/value pair + for each of the object's fields. + + &JSONParsec.hs:p_object; + + Parsing an individual value is a matter of calling an + existing parser, then wrapping its result with the appropriate + JValue constructor. + + &JSONParsec.hs:p_value; + + This leads us to the two most interesting parsers, for + numbers and strings. We'll deal with numbers first, since + they're simpler. + + &JSONParsec.hs:p_number; + + Our trick here is to take advantage of Haskell's standard + number parsing library functions, which are defined in the + Numeric module. The readFloat + function reads an unsigned floating point number, and + readSigned takes a parser for an unsigned + number and turns it into a parser for possibly signed + numbers. + + Since these functions know nothing about Parsec, we have to + work with them specially. Parsec's + getInput function gives us direct access to + Parsec's unconsumed input stream. If readSigned + readFloat succeeds, it returns both the parsed number + and the rest of the unparsed input. We then use + setInput to give this back to Parsec as its + new unconsumed input stream. + + Parsing a string isn't difficult, merely detailed. + + &JSONParsec.hs:p_string; + + We can parse and unescape an escape sequence with the help + of a generic function for combining + Alternatives. + + &JSONParsec.hs:p_escape; + + (We're basing the name asum on the + msum function in + Control.Monad, which serves the same purpose for + MonadPlus instances. It's a little surprising that + asum isn't already defined for us.) + + Finally, JSON lets us encode a Unicode character in + a string as \u followed by four + hexadecimal digits. Because the number of values that can be + represented in 32 bits is much larger than the number of valid + Unicode code points, we must check to ensure that we're decoding + a value within the valid Unicode range, or else we could cause + ourselves a runtime error. + + &JSONParsec.hs:p_unicode; + + The only piece of functionality that applicative + functors are missing compared to monads is the ability to bind a + value to a variable, which we need here in order to be able to + validate the value we're trying to decode. + + This is the one place in our parser that we've needed to + use a monadic function. This pattern extends to more + complicated parsers, too: only infrequently do we need the extra + bit of power that monads offer. + + As we write this book, applicative functors are still quite + new to Haskell, and people are only beginning to explore the + possible uses for them beyond the realm of parsing. hunk ./web/Index.hs 54 - , ch "applicative" "Further adventures in parsing" Unpublished }