[Finish writing about applicative parsing Bryan O'Sullivan **20080219010413] { move ./examples/ch32/ApplicativeParsec.hs ./examples/ch18a/ApplicativeParsec.hs move ./examples/ch32/JSONParsec.hs ./examples/ch18a/JSONParsec.hs addfile ./examples/ch16/MonadPlus.hs addfile ./examples/ch18a/formParse.ghci hunk ./en/book-shortcuts.xml 87 +error"> hunk ./en/ch11-binary.xml 550 - + hunk ./en/ch15-monads.xml 188 - + hunk ./en/ch15-monads.xml 247 - + hunk ./en/ch16-monad-case.xml 128 - + hunk ./en/ch16-monad-case.xml 164 - while mplus combines two results. Here are - the standard definitions of mzero and + while mplus combines two results into one. + Here are the standard definitions of mzero and hunk ./en/ch16-monad-case.xml 200 + + + The name mplus does not imply addition + + Even though the mplus function + contains the text plus, you should not think of + it as necessarily implying that we're trying to add two + values. + + Depending on the monad we're working in, + mplus may implement an + operation that looks like addition. For example, + mplus in the list monad is implemented as + the (++) operator. + + &monadPlus.ghci:list.mplus; + + However, if we switch to another monad, the obvious + similarity to addition falls away. + + &monadPlus.ghci:maybe.mplus; + + + + + Rules for working with MonadPlus + + Instances of the MonadPlus typeclass must + follow a few simple rules, in addition to the usual monad + rules. + + An instance must short circuit if + mzero appears on the left of a bind expression. + In other words, an expression mzero >>= f + must evaluate to the same result as mzero + alone. + + &MonadPlus.hs:shortcircuitLeft; + + An instance must short circuit if mzero + appears on the right of a sequence + expression. + + &MonadPlus.hs:shortcircuitRight; + + + + + Failing safely with MonadPlus + + When we introduced the &fail; function in , we took pains to warn against its + use: in many monads, it's implemented as a call to &error;, + which has disastrous consequences. + + The MonadPlus typeclass gives us a gentler + way to fail a computation, without &fail; or &error; blowing + up in our faces. The rules that we introduced above allow us + to introduce a mzero into our code wherever we + need to, and computation will short circuit at that + point. + + In the Control.Monad module, the standard + function guard packages up this idea in a + convenient form. + + &MonadPlus.hs:guard; + + As a simple example, here's a function that takes a number + x and computes its value modulo some other + number n. If the result is zero, it + returns x, otherwise the current monad's + mzero. + + &MonadPlus.hs:zeroMod; + hunk ./en/ch18-parsec.xml 474 + + + + Parsec and MonadPlus + + Parsec's GenParser monad is an instance of the + MonadPlus typeclass that we introduced in + . The value + mzero represents a parse failure, while + mplus combines two alternative parses into + one, using (<|>). hunk ./en/ch18a-applicative.xml 10 - JSON by writing a parser for JSON data, and again we'll be using - this later, too. + JSON by writing a parser for JSON data, which again we'll be using + later, too. hunk ./en/ch18a-applicative.xml 69 + &formParse.ghci:test; + hunk ./en/ch18a-applicative.xml 75 + + + Supplanting regular expressions for casual parsing + + In many popular languages, people tend to put regular + expressions to work for casual parsing. They're + notoriously tricky for this purpose: hard to write, difficult to + debug, nearly incomprehensible after a few months of neglect, + and no error messages on failure. + + If we can write compact Parsec parsers, we'll gain in + readability, expressiveness, and error reporting. Our parsers + won't be as short as regular expressions, but they'll be close + enough to negate much of the temptation of regexps. + + + + Parsing without variables + + A few of our parsers above use &do; notation and bind the + result of an intermediate parse to a variable, for later use. + One such function is p_pair. + + &FormParse.hs:p_pair.noid; + + We can get rid of the need for explicit variables by using + the liftM2 combinator from + Control.Monad. + + &FormParse.hs:p_pair_app1; + + This parser has exactly the same type and behaviour as + p_pair, but it's one line long. Instead of + writing our parser in a procedural style, we've + simply switched to a programming style that emphasises that + we're applying parsers and + combining their results. + + We can take this applicative style of writing a parser much + further. In most cases, the extra compactness that we will gain + will not come at any cost in readability, + beyond the initial effort of coming to grips with the + idea. + + + + Applicative functors for parsing + + The standard Haskell libraries include a module named + Control.Applicative, which we already encountered + in . This module defines a + typeclass named Applicative, which represents an + applicative functor. This is a little bit + more structured than a functor, but a little bit less than a + monad. It also defines Alternative, which is + similar to MonadPlus + + As usual, we think that the best way to introduce + applicative functors is by putting them to work. In theory, + every monad is an applicative functor, but not every applicative + functor is a monad. Because applicative functors were added to + the standard Haskell libraries long after monads, we often don't + get an Applicative instance for free: frequently, + we have to declare the monad we're using to be + Applicative or Alternative. + + To do this for Parsec, we'll write a small + module that we can import instead of the normal + Parsec module. + + &ApplicativeParsec.hs:ApplicativeParsec; + + For convenience, our module's export section exports all the + names we imported from both the Applicative and + Parsec modules. + + + + Applicative parsing by example + + We'll begin by rewriting our existing form parser from the + bottom up, beginning with p_hex, which + parses a hexadecimal escape sequence. + + &FormApp.hs:p_hex; + + Here's our applicative version. + + &FormApp.hs:a_hex; + + Although the individual parsers are mostly untouched, the + combinators that we're gluing them together with have changed. + The only familiar one is (<$>), which + we already know is a synonym for + fmap. + + From our definition of Applicative, we know + that (<*>) is + ap. + + The remaining unfamiliar combinator is + (*>), which applies its first argument, + throws away its result, then applies the second and returns its + result. In other words, it's similar to + (>>). + + + A handy tip about angle brackets + + Before we continue, here's a useful aid for remembering + what all the angle brackets are for in the combinators from + Control.Applicative: if there's an angle bracket + pointing to some side, the result from that side should be + used. + + For example, (*>) returns the + result on its right; (<*>) returns + results from both sides; and (<*), + which we have not yet seen, returns the result on its + left. + + + Although the concepts here should mostly be familiar from + our earlier coverage of functors and monads, we'll walk through + this function to explain what's happening. First, to get a grip + on our types, we'll hoist hexify to the top + level and give it a signature. + + &FormApp.hs:hexify; + + Parsec's hexDigit parser parses a + single hexadecimal digit. + + &formApp.ghci:hexDigit; + + Therefore, char '%' *> hexDigit has the same + type, since (*>) returns the result on + its right. (Remember that CharParser is nothing + more than a synonym for GenParser Char.) + + &formApp.ghci:char; + + The expression hexify <%$> (char '%' *> + hexDigit) is a parser that matches a % + character followed by hex digit, and whose result is a + function. + + &formApp.ghci:func; + + Finally, (<*>) applies the parser + on its left, then the parser on its right, and applies the + function that's the result of the left parse to the value that's + the result of the right. + + If you've been able to follow this, then you understand the + (<*>) and ap + combinators: (<*>) is plain old + ($) lifted to applicative functors, and + ap the same thing lifted to monads. + + &formApp.ghci:ap; + + Next, we'll consider the p_char + parser. + + &FormApp.hs:p_char; + + This remains almost the same in an applicative style, save + for one piece of convenient notation. + + &FormApp.hs:a_char; + + Here, the (<$) combinator uses the + value on the left if the parser on the right succeeds. + + Finally, the equivalent of p_pair_app1 is + almost identical. + + &FormParse.hs:p_pair_app1.noid; + + All we've changed is the combinator we use for lifting: the + liftA functions act in the same ways as + their liftM cousins. + + &FormApp.hs:a_pair; + + + + Parsing JSON data + + To give ourselves a better feel for parsing with applicative + functors, and to explore a few more corners of Parsec, we'll + write a JSON parser that follows the definition in RFC + 4627. + + Insert a backreference to the introduction to JSON, + whenever that shows up. + + At the top level, a JSON value must be either an object or + an array. + + &JSONParsec.hs:p_text; + + These are structurally similar, with an opening character, + following by one or more items separated by commas, followed by + a closing characters. We capture this similarity by writing a + small helper function. + + &JSONParsec.hs:p_series; + + Here, we finally have a use for the + (<*) combinator that we introduced + earlier. We use it to skip over any white space that might + follow certain tokens. With this p_series + function, parsing an array is simple. + + &JSONParsec.hs:p_array; + + Dealing with a JSON object is hardly more complicated, + requiring just a little more effort to product a name/value pair + for each of the object's fields. + + &JSONParsec.hs:p_object; + + Parsing an individual value is a matter of calling an + existing parser, then wrapping its result with the appropriate + JValue constructor. + + &JSONParsec.hs:p_value; + + This leads us to the two most interesting parsers, for + numbers and strings. We'll deal with numbers first, since + they're simpler. + + &JSONParsec.hs:p_number; + + Our trick here is to take advantage of Haskell's standard + number parsing library functions, which are defined in the + Numeric module. The readFloat + function reads an unsigned floating point number, and + readSigned takes a parser for an unsigned + number and turns it into a parser for possibly signed + numbers. + + Since these functions know nothing about Parsec, we have to + work with them specially. Parsec's + getInput function gives us direct access to + Parsec's unconsumed input stream. If readSigned + readFloat succeeds, it returns both the parsed number + and the rest of the unparsed input. We then use + setInput to give this back to Parsec as its + new unconsumed input stream. + + Parsing a string isn't difficult, merely detailed. + + &JSONParsec.hs:p_string; + + We can parse and unescape an escape sequence with the help + of a generic function for combining + Alternatives. + + &JSONParsec.hs:p_escape; + + (We're basing the name asum on the + msum function in + Control.Monad, which serves the same purpose for + MonadPlus instances. It's a little surprising that + asum isn't already defined for us.) + + Finally, JSON lets us encode a Unicode character in + a string as \u followed by four + hexadecimal digits. Because the number of values that can be + represented in 32 bits is much larger than the number of valid + Unicode code points, we must check to ensure that we're decoding + a value within the valid Unicode range, or else we could cause + ourselves a runtime error. + + &JSONParsec.hs:p_unicode; + + The only piece of functionality that applicative + functors are missing compared to monads is the ability to bind a + value to a variable, which we need here in order to be able to + validate the value we're trying to decode. + + This is the one place in our parser that we've needed to + use a monadic function. This pattern extends to more + complicated parsers, too: only infrequently do we need the extra + bit of power that monads offer. + + As we write this book, applicative functors are still quite + new to Haskell, and people are only beginning to explore the + possible uses for them beyond the realm of parsing. + hunk ./examples/ch16/MonadPlus.hs 1 +import Control.Monad (MonadPlus(..)) + +shortcircuitLeft f = +{-- snippet shortcircuitLeft --} + mzero >>= f == mzero +{-- /snippet shortcircuitLeft --} + +shortcircuitRight v = +{-- snippet shortcircuitRight --} + v >> mzero == mzero +{-- /snippet shortcircuitRight --} + +{-- snippet guard --} +guard :: (MonadPlus m) => Bool -> m () +guard True = return () +guard False = mzero +{-- /snippet guard --} + +{-- snippet zeroMod --} +x `zeroMod` n = guard ((x `mod` n) == 0) >> return x +{-- /snippet zeroMod --} hunk ./examples/ch18a/ApplicativeParsec.hs 1 +{-- snippet ApplicativeParsec --} hunk ./examples/ch18a/ApplicativeParsec.hs 10 +-- Hide a few names that are provided by Applicative. hunk ./examples/ch18a/ApplicativeParsec.hs 13 +-- The Applicative instance for every Monad looks like this. hunk ./examples/ch18a/ApplicativeParsec.hs 15 - pure = return + pure = return hunk ./examples/ch18a/ApplicativeParsec.hs 18 +-- The Alternative instance for every MonadPlus looks like this. hunk ./examples/ch18a/ApplicativeParsec.hs 22 +{-- /snippet ApplicativeParsec --} hunk ./examples/ch18a/FormParse.hs 1 +import Control.Monad (ap, liftM2) hunk ./examples/ch18a/FormParse.hs 18 +{-- snippet p_pair_app1 --} +p_pair_app1 = + liftM2 (,) (many1 p_char) (optionMaybe (char '=' >> many p_char)) +{-- /snippet p_pair_app1 --} + +{-- snippet p_pair_app2 --} +p_pair_app2 = + (,) `fmap` many1 p_char `ap` optionMaybe (char '=' >> many p_char) +{-- /snippet p_pair_app2 --} + hunk ./examples/ch18a/JSONParsec.hs 7 -import Data.Char (isHexDigit) hunk ./examples/ch18a/JSONParsec.hs 11 -p_text :: GenParser Char () JValue +{-- snippet p_text --} +p_text :: CharParser () JValue hunk ./examples/ch18a/JSONParsec.hs 14 - "JSON text" + "JSON text" hunk ./examples/ch18a/JSONParsec.hs 17 +{-- /snippet p_text --} hunk ./examples/ch18a/JSONParsec.hs 19 +{-- snippet p_value --} hunk ./examples/ch18a/JSONParsec.hs 26 - <|> JBool <$> (True <$ string "true" <|> False <$ string "false") + <|> JBool <$> p_bool hunk ./examples/ch18a/JSONParsec.hs 30 +p_bool :: CharParser () Bool +p_bool = True <$ string "true" + <|> False <$ string "false" +{-- /snippet p_value --} + +{-- snippet p_string --} hunk ./examples/ch18a/JSONParsec.hs 38 - where jchar = char '\\' *> special <|> satisfy (`notElem` "\"\\") - special = foldl1 (<|>) escapes <|> unicode - "escape character" - escapes = zipWith ch "bnfrt\\\"/" "\b\n\f\r\t\\\"/" - ch c r = r <$ char c - unicode = char 'u' *> count 4 (satisfy isHexDigit) >>= check - check x | code <= maxChar = pure (toEnum code) - | otherwise = empty + where jchar = char '\\' *> (p_escape <|> p_unicode) + <|> satisfy (`notElem` "\"\\") +{-- /snippet p_string --} + +{-- snippet p_escape --} +p_escape = asum (zipWith decode "bnfrt\\\"/" "\b\n\f\r\t\\\"/") + where decode c r = r <$ char c + +asum :: Alternative f => [f a] -> f a +asum = foldr (<|>) empty +{-- /snippet p_escape --} + +{-- snippet p_unicode --} +p_unicode = char 'u' *> count 4 hexDigit >>= validate + where validate x | code <= maxChar = pure (toEnum code) + | otherwise = empty hunk ./examples/ch18a/JSONParsec.hs 56 +{-- /snippet p_unicode --} hunk ./examples/ch18a/JSONParsec.hs 58 +{-- snippet p_number --} hunk ./examples/ch18a/JSONParsec.hs 64 +{-- /snippet p_number --} hunk ./examples/ch18a/JSONParsec.hs 66 -p_series :: Char -> GenParser Char () a -> Char -> GenParser Char () [a] -p_series l p r = between (char l <* spaces) (char r) $ - (p <* spaces) `sepBy` (char ',' <* spaces) +{-- snippet p_series --} +p_series :: Char -> CharParser () a -> Char -> CharParser () [a] +p_series left parser right = + between (char left <* spaces) (char right) $ + (parser <* spaces) `sepBy` (char ',' <* spaces) +{-- /snippet p_series --} hunk ./examples/ch18a/JSONParsec.hs 73 -p_object :: GenParser Char () (JObject JValue) +{-- snippet p_object --} +p_object :: CharParser () (JObject JValue) hunk ./examples/ch18a/JSONParsec.hs 77 +{-- /snippet p_object --} hunk ./examples/ch18a/JSONParsec.hs 79 -p_array :: GenParser Char () (JArray JValue) -p_array = jarray <$> p_series '[' p_value ']' +{-- snippet p_array --} +p_array :: CharParser () (JArray JValue) +p_array = jarray <$> p_series '[' p_value ']' +{-- /snippet p_array --} hunk ./examples/ch18a/formParse.ghci 1 +:load FormParse + +--# test +parseTest p_query "foo=bar&a%21=b+c" }