[Added HTTP request parser Bryan O'Sullivan **20080408065126] { move ./examples/ch29/ServerParse.hs ./examples/ch18/HttpRequestParser.hs addfile ./examples/ch29/HandleT.hs addfile ./examples/ch29/MaybeT.hs hunk ./en/ch18-parsec.xml 836 + + + + Parsing a HTTP request + + As another example of applicative parsing, we will develop a + basic HTTP request parser. We'll use this later on to create a + small web server. + + &HttpRequestParser.hs:module; + + An HTTP request consists of a method, an identifier, a + series of headers, and an optional body. For simplicity, we'll + focus on just two of the six method types specified by the HTTP + 1.1 standard. A POST method has a body; a + GET has none. + + &HttpRequestParser.hs:HttpRequest; + + Because we're writing in an applicative style, our parser + can be both brief and readable. Readable, that is, if you're + becoming used to the applicative parsing notation. + + &HttpRequestParser.hs:p_request; + + Briefly, the q helper function accepts + a method name, the type constructor to apply to it, and a parser + for a request's optional body. The url + helper does not attempt to validate a URL, because the HTTP + specification does not specify what characters; it just consumes + input until either the line ends or it reaches a HTTP version + identifier. + + + Backtracking and its discontents + + The major point worth calling attention to is our use of + the try combinator. This attempts to + execute a parser. Normally, if a parser fails partway through + parsing, whatever input it has consumed stays consumed, and is + no longer available to whatever parser comes next. When the + parser it takes as argument fails, the + try combinator restores the state of the + input as it was when that parser began. This allows us to + backtrack to an earlier state of the + input, and use another parser. + + What does this mean, and why is it important? Let's + consider a very simple parser. + + &try.ghci:simple; + + As we expect, if the parser fails to match its input, we + get an error. Let's make the parser slightly more + complex. + + &try.ghci:broken; + + Once again, the parser behaves as we expect in one case. + However, let's see if we can handle the other string we expect + to be able to match. + + &try.ghci:fail; + + We get the same error message as before! The reason is + that the first parser has already consumed the first two + characters of the input when it fails. The second parser thus + doesn't have access to them. + + If we wrap the first parser in a try, + it will reset the input state if it fails. We now get + sensible results in each case. + + &try.ghci:try; + + There is a problem, however: because it has to hold onto + input in case it needs to restore it, try + is expensive to use. Sprinkling a parser with unnecessary + uses of try is a very effective way to + slow it down, sometimes to the point of unacceptable + performance. + + The standard way to avoid the need for backtracking is to + tidy up a parser so that we can decide whether it will succeed + or fail using only a single token of input. In this case, the + two parsers consume the same initial tokens, so we turn them + into a single parser. + + &try.ghci:factor; + + Even better, Parsec gives us an improved error message if + we feed it non-matching input. + + &try.ghci:error; + + + + + Parsing headers + + Following the first line of a HTTP request is a series of + zero or more headers. A header begins with a field name, + followed by a colon, followed by the content. If the lines + that follow begin with spaces, they are treated as + continuations of the current + content. + + &HttpRequestParser.hs:p_headers; + + + + + Exercises + + Our HTTP request parser is too simple to be useful in real + deployments. It is missing vital functionality, and is not + resistant to even the most basic denial of service + attacks. + + + + + Make the parser honour the + Content-Length field properly, if it is + present. + + + + + + A popular denial of service attack against naive web + servers is simply to send unreasonably long headers. A + single header might contain tens or hundreds of + megabytes of garbage text, causing a server to run out + of memory. + + Restructure the header parser so that it will fail + if any line is longer than 4096 characters. It must + fail immediately when this occurs; it cannot wait until + the end of a line eventually shows up. + + + + + + Add the ability to honour the + Transfer-Encoding: chunked header if it is + present. See section + 3.6.1 of RFC 2616 for details. + + + + + + Another popular attack is to open a connection and + either leave it idle or send data extremely slowly. Use + the System.Timeout module to write a + wrapper for the parser that will close the connection if + the parser has not completed within 30 seconds. + + + + hunk ./examples/ch18/HttpRequestParser.hs 1 -module ServerParse +{-- snippet module --} +module HttpRequestParser hunk ./examples/ch18/HttpRequestParser.hs 6 - , p_query hunk ./examples/ch18/HttpRequestParser.hs 7 - , parse + , p_query hunk ./examples/ch18/HttpRequestParser.hs 14 - -urlBaseChars :: [Char] -urlBaseChars = ['a'..'z']++['A'..'Z']++['0'..'9']++"$-_.!*'()," - -p_query :: CharParser () [(String, Maybe String)] -p_query = pair `sepBy` char '&' - where pair = (,) <$> many1 safe <*> optional (char '=' *> many safe) - safe = oneOf urlBaseChars - <|> char '%' *> liftA2 diddle hexDigit hexDigit - <|> ' ' <$ char '+' - "safe" - diddle a b = toEnum . fst . head . readHex $ [a,b] - -crlf :: CharParser st () -crlf = (() <$ string "\r\n") <|> (() <$ newline) - -notEOL :: CharParser st Char -notEOL = noneOf "\r\n" +{-- /snippet module --} hunk ./examples/ch18/HttpRequestParser.hs 16 +{-- snippet p_headers --} hunk ./examples/ch18/HttpRequestParser.hs 20 - fieldName = (:) <$> letter <*> many fieldChar - fieldChar = letter <|> digit <|> oneOf "-_" hunk ./examples/ch18/HttpRequestParser.hs 23 + fieldName = (:) <$> letter <*> many fieldChar + fieldChar = letter <|> digit <|> oneOf "-_" + +crlf :: CharParser st () +crlf = (() <$ string "\r\n") <|> (() <$ newline) hunk ./examples/ch18/HttpRequestParser.hs 29 +notEOL :: CharParser st Char +notEOL = noneOf "\r\n" +{-- /snippet p_headers --} + +{-- snippet HttpRequest --} hunk ./examples/ch18/HttpRequestParser.hs 43 +{-- /snippet HttpRequest --} hunk ./examples/ch18/HttpRequestParser.hs 45 +{-- snippet p_request --} hunk ./examples/ch18/HttpRequestParser.hs 49 - where q s c p = liftM4 HttpRequest (c <$ string s <* char ' ') - url p_headers p + where q name ctor body = liftM4 HttpRequest req url p_headers body + where req = ctor <$ string name <* char ' ' hunk ./examples/ch18/HttpRequestParser.hs 54 +{-- /snippet p_request --} + +urlBaseChars :: [Char] +urlBaseChars = ['a'..'z']++['A'..'Z']++['0'..'9']++"$-_.!*'()," + +p_query :: CharParser () [(String, Maybe String)] +p_query = pair `sepBy` char '&' + where pair = (,) <$> many1 safe <*> optional (char '=' *> many safe) + safe = oneOf urlBaseChars + <|> char '%' *> liftA2 diddle hexDigit hexDigit + <|> ' ' <$ char '+' + "safe" + diddle a b = toEnum . fst . head . readHex $ [a,b] hunk ./examples/ch29/Comment.hs 22 -import System.IO (Handle, hClose, hGetContents, hPutStr) +import qualified System.IO +import System.IO (Handle, hClose) hunk ./examples/ch29/Comment.hs 25 +import HandleT hunk ./examples/ch29/Comment.hs 97 -block :: IO () -block = forever $ threadDelay maxBound - hunk ./examples/ch29/Comment.hs 104 - block + forever $ threadDelay maxBound hunk ./examples/ch29/Comment.hs 192 +joinLookup :: (Eq a) => a -> [(a, Maybe b)] -> Maybe b hunk ./examples/ch29/Comment.hs 224 - if any ((cmtComment cmt ==) . cmtComment) cmts + if any (on (==) cmtComment cmt) cmts hunk ./examples/ch29/Comment.hs 244 +instance MonadHandle H where + hPutStrLn h s = liftIO (System.IO.hPutStr h s >> + System.IO.hPutStr h "\r\n") + hunk ./examples/ch29/Comment.hs 251 - input <- liftIO $ hGetContents h + input <- hGetContents h hunk ./examples/ch29/Comment.hs 255 - let putLine s = liftIO (hPutStr h s >> hPutStr h "\r\n") - put = liftIO . hPutStr h - putLine $ "HTTP/1.1 " ++ respStatus resp - mapM_ putLine (respHeaders resp) - putLine "" + hPutStrLn h $ "HTTP/1.1 " ++ respStatus resp + mapM_ (hPutStrLn h) (respHeaders resp) + hPutStrLn h "" hunk ./examples/ch29/Comment.hs 261 - put body + hPutStr h body hunk ./examples/ch29/Comment.hs 263 - then putLine "" + then hPutStrLn h "" hunk ./examples/ch29/HandleT.hs 1 +module HandleT + ( + MonadHandle(..) + , HandleT + , runHandleT + ) where + +import Control.Monad.Trans +import qualified System.IO +import System.IO (Handle, hClose) + +class MonadIO m => MonadHandle m where + hGetContents :: Handle -> m String + hGetContents = liftIO . System.IO.hGetContents + + hPutStr :: Handle -> String -> m () + hPutStr h = liftIO . System.IO.hPutStr h + + hPutStrLn :: Handle -> String -> m () + hPutStrLn h = liftIO . System.IO.hPutStrLn h + +instance MonadHandle IO + +newtype HandleT m a = HandleT { runHandleT :: m a } + +instance MonadTrans HandleT where + lift = HandleT + +instance Monad m => Monad (HandleT m) where + return = HandleT . return + m >>= k = HandleT $ runHandleT m >>= runHandleT . k + fail = lift . fail + +instance MonadIO m => MonadIO (HandleT m) where + liftIO = lift . liftIO hunk ./examples/ch29/MaybeT.hs 1 +import Control.Monad +import Control.Monad.Trans +import Control.Monad.State +import Control.Monad.Reader +import Control.Monad.Writer + +newtype MaybeT m a = MaybeT {runMaybeT :: m (Maybe a)} + +instance Functor m => Functor (MaybeT m) where + fmap f x = MaybeT $ fmap (fmap f) $ runMaybeT x + +instance Monad m => Monad (MaybeT m) where + return = MaybeT . return . Just + x >>= f = MaybeT $ runMaybeT x >>= maybe (return Nothing) (runMaybeT . f) + fail _ = MaybeT $ return Nothing }