Chapter 10. Using Parsec

Table of Contents

First Steps with Parsec: Simple CSV Parsing
The sepBy and endBy Combinators
Choices and Errors
Lookahead
Error Handling
Extended Example: Full CSV Parser

The task of parsing a file, or data of various types, is a common one for programmers. You already learned about Haskell's support for regular expressions back in the section called “Regular expressions in Haskell”. Regular expressions are nice for many tasks. But they can become quite complex and difficult when dealing with a data format that is complex. For instance, you probably wouldn't want to use regular expressions to parse C source code.

Haskell has a very useful and unique library called Parsec. Parsec is a parser combinator library. With Parsec, you will combine small parsing functions together to build up more complex parsers. Parsec provides some simple parsing functions, as well as functions to tie them all together. It should come as no surprise that this parser library for Haskell is built around the notion of functions.

Those of you that are familiar with parser tools in other languages may be interested to know that Parsec can take the place of both parser tools such as Yacc and lexers such as flex. Parsec can also be used to implement a single-stange parser.

In this chapter, we will use Parsec to build up a parser for CSV files. We will also use Parsec to build a more complex parser for sectioned configuration files.

First Steps with Parsec: Simple CSV Parsing

Let's jump right in by writing some code for parsing a CSV file. CSV files are often used as a plain text representation of spreadsheets or databases. Each line is a record, and each field in the record is separated from the next by a comma. There are ways of dealing with fields that contain commas, but to start with, we won't worry about it.

This first example is much longer than it really needs to be. We will introduce more Parsec features in a little bit that will shrink the parser down to only four lines!

import Text.ParserCombinators.Parsec

{- A CSV file contains 0 or more lines, each of which is terminated
   by the end-of-line character (eol). -}
csvFile :: GenParser Char st [[String]]
csvFile = 
    do result <- many line
       eof
       return result

-- Each line contains 1 or more cells, separated by a comma
line :: GenParser Char st [String]
line = 
    do result <- cells
       eol                       -- end of line
       return result
       
-- Build up a list of cells.  Try to parse the first cell, then figure out 
-- what ends the cell.
cells :: GenParser Char st [String]
cells = 
    do first <- cellContent
       next <- remainingCells
       return (first : next)

-- The cell either ends with a comma, indicating that 1 or more cells follow,
-- or it doesn't, indicating that we're at the end of the cells for this line
remainingCells :: GenParser Char st [String]
remainingCells =
    (char ',' >> cells)            -- Found comma?  More cells coming
    <|> (return [])                -- No comma?  Return [], no more cells

-- Each cell contains 0 or more characters, which must not be a comma or
-- EOL
cellContent :: GenParser Char st String
cellContent = 
    many (noneOf ",\n")
       

-- The end of line character is \n
eol :: GenParser Char st Char
eol = char '\n'

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" inputimport Text.ParserCombinators.Parsec

{- A CSV file contains 0 or more lines, each of which is terminated
   by the end-of-line character (eol). -}
csvFile :: GenParser Char st [[String]]
csvFile = 
    do result <- many line
       eof
       return result

-- Each line contains 1 or more cells, separated by a comma
line :: GenParser Char st [String]
line = 
    do result <- cells
       eol                       -- end of line
       return result
       
-- Build up a list of cells.  Try to parse the first cell, then figure out 
-- what ends the cell.
cells :: GenParser Char st [String]
cells = 
    do first <- cellContent
       next <- remainingCells
       return (first : next)

-- The cell either ends with a comma, indicating that 1 or more cells follow,
-- or it doesn't, indicating that we're at the end of the cells for this line
remainingCells :: GenParser Char st [String]
remainingCells =
    (char ',' >> cells)            -- Found comma?  More cells coming
    <|> (return [])                -- No comma?  Return [], no more cells

-- Each cell contains 0 or more characters, which must not be a comma or
-- EOL
cellContent :: GenParser Char st String
cellContent = 
    many (noneOf ",\n")
       

-- The end of line character is \n
eol :: GenParser Char st Char
eol = char '\n'

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input

Before looking at all this code in detail, let's play with it a bit and see what it does.

ghci> :l csv1.hs
[1 of 1] Compiling Main             ( csv1.hs, interpreted )
Ok, modules loaded: Main.
ghci> parseCSV ""
Loading package parsec-2.0 ... linking ... done.
Right []
ghci> :l csv1.hs
[1 of 1] Compiling Main             ( csv1.hs, interpreted )
Ok, modules loaded: Main.
ghci> parseCSV ""
Loading package parsec-2.0 ... linking ... done.
Right []

That makes sense: parsing the empty string returns an empty list. Let's try parsing a single cell.

ghci> parseCSV "hi"
Left "(unknown)" (line 1, column 3):
unexpected end of input
expecting "," or "\n"
ghci> parseCSV "hi"
Left "(unknown)" (line 1, column 3):
unexpected end of input
expecting "," or "\n"

Look at that. Recall how we defined that each line must end with the end-of-line character, and we didn't give it. Parsec's error message helpfully indicated the line number and column number of the problem, and even told us what it was expecting! Let's give it an end-of-line character and continue experimenting.

ghci> parseCSV "hi\n"
Right [["hi"]]
ghci> parseCSV "line1\nline2\nline3\n"
Right [["line1"],["line2"],["line3"]]
ghci> parseCSV "cell1,cell2,cell3\n"
Right [["cell1","cell2","cell3"]]
ghci> parseCSV "l1c1,l1c2\nl2c1,l2c2\n"
Right [["l1c1","l1c2"],["l2c1","l2c2"]]
ghci> parseCSV "Hi,\n\n,Hello\n"
Right [["Hi",""],[""],["","Hello"]]
ghci> parseCSV "hi\n"
Right [["hi"]]
ghci> parseCSV "line1\nline2\nline3\n"
Right [["line1"],["line2"],["line3"]]
ghci> parseCSV "cell1,cell2,cell3\n"
Right [["cell1","cell2","cell3"]]
ghci> parseCSV "l1c1,l1c2\nl2c1,l2c2\n"
Right [["l1c1","l1c2"],["l2c1","l2c2"]]
ghci> parseCSV "Hi,\n\n,Hello\n"
Right [["Hi",""],[""],["","Hello"]]

You can see that parseCSV is doing exactly what we wanted it to do. It's even handling empty cells and empty lines properly.

Let's take a look at the code for this example. We didn't use many shortcuts here, so remember that this will get shorter and simpler!

We've built it from the top down, so our first function is csvFile. The type of this function is GenParser Char st [[String]]. This means that the type of the input is a sequence of characters, which is exactly what a Haskell string is, since String is the same as [Char]. It also means that we will return a value of type [[String]]: a list of a list of strings. The st can be ignored for now.

Parsec programmers usually omit type declarations, since we write so many small functions. Haskell's type inference can figure it out. We've listed the types for the first example here so you can get a better idea of what's going on. You can always use :t in ghci to inspect types as well.

The csvFile uses a do block. Parsec is a monadic parser combinator, but we are not using the IO monad here. As we discussed in fixme: insert ref, not all monads are IO monads.

We start by running many line. many is a function that takes a function as an argument. It tries to repeatedly parse the input using the function passed to it. It gathers up the results from all that repeated parsing and returns a list of them. So, here, we are storing the results of parsing all lines in result. Then we look for the end-of-file indicator, called eof. Finally, we return the result. So, a CSV file is made of of many lines, then the end of file. You can often read out Parsec functions in plain English just like that.

Now we must answer the question: what is a line? We define the line function to do just that. Reading the function, we can see that a line consists of cells followed by the end of line character.

So what are cells? We defined them in the cells function. The cells of a line start with the content of the first cell, then continue with the content of the remaining cells, if any. The result is simply the first cell and the remaining cells assembled into a list.

Let's skip over remainingCells for a minute and look at cellContent. A cell contains any number of characters, but each character must not be a comma or end of line character. The noneOf function matches one item, so long as it isn't in the list of items that we pass. So, saying many (noneOf ",\n") defines a cell the way we want it.

Back in remainingCells, we have the first example of a choice in Parsec. The choice operator is <|>. This operator behaves like this: it will first try the parser on the left. If it consumed no input, it will try the parser on the right.

So, in remainingCells, our task is to come up with all the cells after the first. Recall that cellContent uses noneOf ",\n". So it will not consume the comma or end-of-line character from the input. If we see a comma after parsing a cell, it means that at least one more cell follows. Otherwise, we're done. So, our first choice in remainingCells is char ','. This parser simply matches the passed character in the input. If we found a comma, we want this function to return the remaining cells on the line. At this point, the "remaining cells" looks exactly like the start of the line, so we call cells recursively to parse them. If we didn't find a comma, we return the empty list, signifying no remaining cells on the line.

Finally, we must define what the end-of-line indicator is. We set it to char '\n', which will suit our purposes fine for now.

At the very end of the program, we define a function parseCSV that takes a String and parses it as a CSV file. This function is just a shortcut that calls Parsec's parse function, filling in a few parameters. parse returns Either ParseError [[String]] for the CSV file. If there was an error, the return vaule will be Left with the error; otherwise, it will be Right with the result.

The sepBy and endBy Combinators

We promised you earlier that we could simply our CSV parser significantly by using a few Parsec helper functions. There are two that will dramatically simplify this code. The first is endBy, which takes two functions. It applies the first function, notes its result, then applies the second. It will do this over and over, and return a list of the results from the first function.

The second tool for us is sepBy. It's like endBy, but expects the very last item to not end with the separator.

So, we can use endBy to parse lines, since every line must end with the end-of-line character. We can use sepBy to parse cells, since the last cell will not end with a comma. Take a look at how much simpler our parser is now:

import Text.ParserCombinators.Parsec

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n")
eol = char '\n'

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" inputimport Text.ParserCombinators.Parsec

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n")
eol = char '\n'

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input

This program behaves exactly the same as the first one. You can verify that by using ghci to re-run our examples from the earlier example. You'll get the same result from every one. Yet the program is much shorter and more readable. It won't be long before you can translate Parsec code like this into a file format definition in plain English. As you read over this code, you can see that:

  • A CSV file contains 0 or more lines, each of which is terminated by the end-of-line character.

  • A line contains 0 or more cells, separated by a comma.

  • A cell contains 0 or more characters, which must be neither the comma nor the end-of-line character.

  • The end-of-line character is the newline, \n.

Choices and Errors

Different operating systems use different characters to mark the end-of-line. Unix/Linux systems, plus Windows in text mode, use simply "\n". FIXME: verify this on Windows DOS and Windows systems use "\n\r", and Macs traditionally use "\r\n". We could also support a bare "\r" in case anybody uses that.

We could easily adapt our example to be able to handle all these types of line endings in a single file. We would need to make two modifications: adjust eol to recognize the different endings, and adjust the noneOf pattern in cell to ignore \r.

This must be done carefully. Recall that our earlier definition of eol was simply char '\n'. There is a parser called string that we can use to match the multi-character patterns. Let's start by thinking of how we would add support for \n\r.

Our first attempt might look like this:

-- This function is not correct!
eol = string "\n" <|> string "\n\r"-- This function is not correct!
eol = string "\n" <|> string "\n\r"

This isn't quite right. Recall that the <|> operator always tries the left alternative first. Looking for the single character \n will match both types of line endings, so it will look to the system that the following line begins with \r. Not what we want. Try it in ghci:

ghci> :m Text.ParserCombinators.Parsec
ghci> let eol = string "\n" <|> string "\n\r"
Loading package parsec-2.0 ... linking ... done.
ghci> parse eol "" "\n"
Right "\n"
ghci> parse eol "" "\n\r"
Right "\n"
ghci> :m Text.ParserCombinators.Parsec
ghci> let eol = string "\n" <|> string "\n\r"
Loading package parsec-2.0 ... linking ... done.
ghci> parse eol "" "\n"
Right "\n"
ghci> parse eol "" "\n\r"
Right "\n"

It may look like it worked for both endings, but actually looking at it this way, we can't tell. If it left something un-parsed, we don't know, because we're not looking for anything else. So let's look for the end-of-file after our end of line:

ghci> parse (eol >> eof) "" "\n\r"
Left (line 2, column 1):
unexpected "\r"
expecting end of input
ghci> parse (eol >> eof) "" "\n"
Right ()
ghci> parse (eol >> eof) "" "\n\r"
Left (line 2, column 1):
unexpected "\r"
expecting end of input
ghci> parse (eol >> eof) "" "\n"
Right ()

As expected, we got an error from the \n\r ending. So the next temptation may be to try it this way:

-- This function is not correct!
eol = string "\n\r" <|> string "\n"-- This function is not correct!
eol = string "\n\r" <|> string "\n"

This also isn't right. Recall that <|> only attempts the option on the right of the option on the left consumed no input. But by the time we are able to see if there is a \r after the \n, we've already consumed the \n. This time, we fail on the other case in ghci:

ghci> :m Text.ParserCombinators.Parsec
ghci> let eol = string "\n\r" <|> string "\n"
Loading package parsec-2.0 ... linking ... done.
ghci> parse (eol >> eof) "" "\n\r"
Right ()
ghci> parse (eol >> eof) "" "\n"
Left (line 1, column 1):
unexpected end of input
expecting "\n\r"
ghci> :m Text.ParserCombinators.Parsec
ghci> let eol = string "\n\r" <|> string "\n"
Loading package parsec-2.0 ... linking ... done.
ghci> parse (eol >> eof) "" "\n\r"
Right ()
ghci> parse (eol >> eof) "" "\n"
Left (line 1, column 1):
unexpected end of input
expecting "\n\r"

We've stumbled upon the lookahead problem. It turns out that, when writing parsers, it's often very convenient to be able to "look ahead" at the data that's coming in. Parsec supports this, but before showing you how to use it, let's see how you would have to write this to get along without it. You'd have to manually expand all the options after the \n like this:

eol = 
    do char '\n'
       char '\r' <|> return '\n'eol = 
    do char '\n'
       char '\r' <|> return '\n'

This function first looks for \n. If it is found, then it will look for \r, consuming it if possible. Since the return type of char '\r' is a Char, the alternative action is to simply return a Char without attempting to parse anything. Parsec has a function option that can also express this idiom as option '\n' (char '\r'). Let's test this with ghci.

ghci> :l csv5.hs
[1 of 1] Compiling Main             ( csv5.hs, interpreted )
Ok, modules loaded: Main.
ghci> parse (eol >> eof) "" "\n\r"
Loading package parsec-2.0 ... linking ... done.
Right ()
ghci> parse (eol >> eof) "" "\n"
Right ()
ghci> :l csv5.hs
[1 of 1] Compiling Main             ( csv5.hs, interpreted )
Ok, modules loaded: Main.
ghci> parse (eol >> eof) "" "\n\r"
Loading package parsec-2.0 ... linking ... done.
Right ()
ghci> parse (eol >> eof) "" "\n"
Right ()

This time, we got the right result! But we could have done it easier with Parsec's lookahead support.

Lookahead

Parsec has a function called try that is used to express lookaheads. try takes one function, a parser. It applies that parser. If the parser doesn't succeed, try behaves as if it hadn't consumed any input at all. So, when you use try on the left side of <|>, Parsec will try the option on the right even if the lift side failed after consuming some input. try only has an effect if it is on the left of a <|>. Keep in mind, though, that many functions use <|> interally. Sometimes you may want to add it to your code just in case it will be used on the left of a <|>, and that works fine. Here's a way to add expanded end-of-line support to our CSV parser using try:

import Text.ParserCombinators.Parsec

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n\r")

eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" inputimport Text.ParserCombinators.Parsec

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n\r")

eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input

Here we put both of the two-character endings first, and run both tests under try. Both of them occur to the left of a <|>, so they will do the right thing. We could have put string "\n" within a try, but it wouldn't have altered any behavior since they look at only one character anyway. We can load this up and test the eol function in ghci.

ghci> :l csv6.hs
[1 of 1] Compiling Main             ( csv6.hs, interpreted )
Ok, modules loaded: Main.
ghci> parse (eol >> eof) "" "\n\r"
Loading package parsec-2.0 ... linking ... done.
Right ()
ghci> parse (eol >> eof) "" "\n"
Right ()
ghci> parse (eol >> eof) "" "\r\n"
Right ()
ghci> parse (eol >> eof) "" "\r"
Right ()
ghci> :l csv6.hs
[1 of 1] Compiling Main             ( csv6.hs, interpreted )
Ok, modules loaded: Main.
ghci> parse (eol >> eof) "" "\n\r"
Loading package parsec-2.0 ... linking ... done.
Right ()
ghci> parse (eol >> eof) "" "\n"
Right ()
ghci> parse (eol >> eof) "" "\r\n"
Right ()
ghci> parse (eol >> eof) "" "\r"
Right ()

All four endings were handled properly. You can also test the full CSV parser with some different endings like this:

ghci> parseCSV "line1\r\nline2\nline3\n\rline4\rline5\n"
Right [["line1"],["line2"],["line3"],["line4"],["line5"]]
ghci> parseCSV "line1\r\nline2\nline3\n\rline4\rline5\n"
Right [["line1"],["line2"],["line3"],["line4"],["line5"]]

As you can see, this program even supports different line endings within a single file.

Error Handling

At the beginning of this chapter, you saw how Parsec could generate error messages that list the location where the error occured as well as what was expected. As parsers get more complex, the list of what was expected can become cumbersome. Parsec provides a way for you to specify custom error messages in the event of parse failures.

Let's look at what happens on an error when our current CSV parser:

ghci> parseCSV "line1"
Left "(unknown)" (line 1, column 6):
unexpected end of input
expecting ",", "\n\r", "\r\n", "\n" or "\r"
ghci> parseCSV "line1"
Left "(unknown)" (line 1, column 6):
unexpected end of input
expecting ",", "\n\r", "\r\n", "\n" or "\r"

That's a pretty long, and technical, error message. We could make an attempt to resolve this by using the monad fail function like so:

eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"
    <|> fail "Couldn't find EOL"eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"
    <|> fail "Couldn't find EOL"

Under ghci, we can see the result:

ghci> :l csv7.hs
[1 of 1] Compiling Main             ( csv7.hs, interpreted )
Ok, modules loaded: Main.
ghci> parseCSV "line1"
Loading package parsec-2.0 ... linking ... done.
Left "(unknown)" (line 1, column 6):
unexpected end of input
expecting ",", "\n\r", "\r\n", "\n" or "\r"
Couldn't find EOL
ghci> :l csv7.hs
[1 of 1] Compiling Main             ( csv7.hs, interpreted )
Ok, modules loaded: Main.
ghci> parseCSV "line1"
Loading package parsec-2.0 ... linking ... done.
Left "(unknown)" (line 1, column 6):
unexpected end of input
expecting ",", "\n\r", "\r\n", "\n" or "\r"
Couldn't find EOL

We added to the error result, but didn't really help clean up the output. Parsec has an <?> operator that is designed for just these situations. It is similar to <|> in that it first tries the parser on its left. Instead of trying another parser in the event of a failure, it presents an error message. Here's how we'd use it:

eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"
    <?> "end of line"eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"
    <?> "end of line"

Now, when you generate an error, you'll get more helpful output:

ghci> :l csv8.hs
[1 of 1] Compiling Main             ( csv8.hs, interpreted )
Ok, modules loaded: Main.
ghci> parseCSV "line1"
Loading package parsec-2.0 ... linking ... done.
Left "(unknown)" (line 1, column 6):
unexpected end of input
expecting "," or end of line
ghci> :l csv8.hs
[1 of 1] Compiling Main             ( csv8.hs, interpreted )
Ok, modules loaded: Main.
ghci> parseCSV "line1"
Loading package parsec-2.0 ... linking ... done.
Left "(unknown)" (line 1, column 6):
unexpected end of input
expecting "," or end of line

That's pretty helpful! The general rule of thumb is that you put a human description of what you're looking for to the right of <?>.

Extended Example: Full CSV Parser

Our earlier CSV examples have had an important flaw: they weren't able to handle cells that contain a comma. CSV generating programs typically put quotation marks around such data. But then you have another problem: what to do if a cell contains a quotation mark and a comma. In these cases, the embedded quotation marks are doubled up.

Here is a full CSV parser. You can use this from ghci, or if you compile it to a standalone program, it will parse a CSV file on standard input and convert it a different format on output.

import Text.ParserCombinators.Parsec

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = quotedCell <|> many (noneOf ",\n\r")

quotedCell = 
    do char '"'
       content <- many quotedChar
       char '"' <?> "quote at end of cell"
       return content

quotedChar =
        noneOf "\""
    <|> try (string "\"\"" >> return '"')

eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"
    <?> "end of line"

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input

main =
    do c <- getContents
       case parse csvFile "(stdin)" c of
            Left e -> do putStrLn "Error parsing input:"
                         print e
            Right r -> mapM_ print rimport Text.ParserCombinators.Parsec

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = quotedCell <|> many (noneOf ",\n\r")

quotedCell = 
    do char '"'
       content <- many quotedChar
       char '"' <?> "quote at end of cell"
       return content

quotedChar =
        noneOf "\""
    <|> try (string "\"\"" >> return '"')

eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"
    <?> "end of line"

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input

main =
    do c <- getContents
       case parse csvFile "(stdin)" c of
            Left e -> do putStrLn "Error parsing input:"
                         print e
            Right r -> mapM_ print r

That's a full-featured CSV parser in just 21 lines of code, plus an additional 10 lines for the parseCSV and main utility functions.

Let's look at the changes in this program from the previous versions. First, a cell may now be either a bare cell or a "quoted" cell. We give the quotedCell option first, because we want to follow that path if the first character in a cell is the quote mark.

The quotedCell begins and ends with a quote mark, and contains zero or more characters. These characters can't be copied directly, though, because they may contain embedded, doubled-up, quote marks themselves. So we define a custom quotedChar to process them.

When we're processing characters inside a quoted cell, we first say noneOf "\"". This will match and return any single character as long as it's not the quote mark. Otherwise, if it is the quote mark, we see if we have two of them in a row. If so, we return a single quote mark to go on our result string.

Notice that try in quotedChar on the right side of <|>. Recall that I said that try only has an effect if it is on the left side of <|>. This try does occur on the left side of a <|>, but on the left of one that must be within the implementation of many.

This try is actually quite important. Let's say you are parsing a quoted cell and are getting towards the end of it. There will be another cell following. So you will expect to see a quote to end the current cell, followed by a comma. When you hit quotedChar, you will fail the noneOf test and proceed to the test that looks for two quotes in a row. You'll also fail that one because you'll have a quote, then a comma. If you hadn't used try, you'd crash with an error at this point, saying that it was expecting the second quote, because the first quote was already consumed. Since we use try, this is properly recognized as not a character that's part of the cell, so it terminates the many quotedChar expression as expected. Lookahead has once again proven very useful, and the fact that it is so easy to add makes it a remarkable tool in Parsec.

You can test this program with ghci over some quoted cells:

ghci> :l csv9.hs
[1 of 1] Compiling Main             ( csv9.hs, interpreted )
Ok, modules loaded: Main.
ghci> parseCSV "\"This, is, one, big, cell\"\n"
Loading package parsec-2.0 ... linking ... done.
Right [["This, is, one, big, cell"]]
ghci> parseCSV "\"Cell without an end\n"
Left "(unknown)" (line 2, column 1):
unexpected end of input
expecting "\"\"" or quote at end of cell
ghci> :l csv9.hs
[1 of 1] Compiling Main             ( csv9.hs, interpreted )
Ok, modules loaded: Main.
ghci> parseCSV "\"This, is, one, big, cell\"\n"
Loading package parsec-2.0 ... linking ... done.
Right [["This, is, one, big, cell"]]
ghci> parseCSV "\"Cell without an end\n"
Left "(unknown)" (line 2, column 1):
unexpected end of input
expecting "\"\"" or quote at end of cell

Let's run it over a real CSV file. Here's a CSV file generated by a spreadsheet program:

"Product","Price"
"O'Reilly Socks",10
"Shirt with ""Haskell"" text",20
"Shirt, ""O'Reilly"" version",20
"Haskell Caps",15
    
"Product","Price"
"O'Reilly Socks",10
"Shirt with ""Haskell"" text",20
"Shirt, ""O'Reilly"" version",20
"Haskell Caps",15
    

Now, we can run this under our test program and watch:

$ runhaskell csv9.hs < test.csv
["Product","Price"]
["O'Reilly Socks","10"]
["Shirt with \"Haskell\" text","20"]
["Shirt, \"O'Reilly\" version","20"]
["Haskell Caps","15"]
    
$ runhaskell csv9.hs < test.csv
["Product","Price"]
["O'Reilly Socks","10"]
["Shirt with \"Haskell\" text","20"]
["Shirt, \"O'Reilly\" version","20"]
["Haskell Caps","15"]
    

Want to stay up to date? Subscribe to the comment feed for this chapter, or the entire book.

Copyright 2007 Bryan O'Sullivan, Don Stewart, and John Goerzen. This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 License. Icons by Paul Davey aka Mattahan.