Table of Contents
The task of parsing a file, or data of various types, is a common one for programmers. You already learned about Haskell's support for regular expressions back in the section called “Regular expressions in Haskell”. Regular expressions are nice for many tasks. But they can become quite complex and difficult when dealing with a data format that is complex. For instance, you probably wouldn't want to use regular expressions to parse C source code.
Haskell has a very useful and unique library called Parsec. Parsec is a parser combinator library. With Parsec, you will combine small parsing functions together to build up more complex parsers. Parsec provides some simple parsing functions, as well as functions to tie them all together. It should come as no surprise that this parser library for Haskell is built around the notion of functions.
Those of you that are familiar with parser tools in other languages may be interested to know that Parsec can take the place of both parser tools such as Yacc and lexers such as flex. Parsec can also be used to implement a single-stange parser.
In this chapter, we will use Parsec to build up a parser for CSV files. We will also use Parsec to build a more complex parser for sectioned configuration files.
Let's jump right in by writing some code for parsing a CSV file. CSV files are often used as a plain text representation of spreadsheets or databases. Each line is a record, and each field in the record is separated from the next by a comma. There are ways of dealing with fields that contain commas, but to start with, we won't worry about it.
This first example is much longer than it really needs to be. We will introduce more Parsec features in a little bit that will shrink the parser down to only four lines!
import Text.ParserCombinators.Parsec {- A CSV file contains 0 or more lines, each of which is terminated by the end-of-line character (eol). -} csvFile :: GenParser Char st [[String]] csvFile = do result <- many line eof return result -- Each line contains 1 or more cells, separated by a comma line :: GenParser Char st [String] line = do result <- cells eol -- end of line return result -- Build up a list of cells. Try to parse the first cell, then figure out -- what ends the cell. cells :: GenParser Char st [String] cells = do first <- cellContent next <- remainingCells return (first : next) -- The cell either ends with a comma, indicating that 1 or more cells follow, -- or it doesn't, indicating that we're at the end of the cells for this line remainingCells :: GenParser Char st [String] remainingCells = (char ',' >> cells) -- Found comma? More cells coming <|> (return []) -- No comma? Return [], no more cells -- Each cell contains 0 or more characters, which must not be a comma or -- EOL cellContent :: GenParser Char st String cellContent = many (noneOf ",\n") -- The end of line character is \n eol :: GenParser Char st Char eol = char '\n' parseCSV :: String -> Either ParseError [[String]] parseCSV input = parse csvFile "(unknown)" inputimport Text.ParserCombinators.Parsec {- A CSV file contains 0 or more lines, each of which is terminated by the end-of-line character (eol). -} csvFile :: GenParser Char st [[String]] csvFile = do result <- many line eof return result -- Each line contains 1 or more cells, separated by a comma line :: GenParser Char st [String] line = do result <- cells eol -- end of line return result -- Build up a list of cells. Try to parse the first cell, then figure out -- what ends the cell. cells :: GenParser Char st [String] cells = do first <- cellContent next <- remainingCells return (first : next) -- The cell either ends with a comma, indicating that 1 or more cells follow, -- or it doesn't, indicating that we're at the end of the cells for this line remainingCells :: GenParser Char st [String] remainingCells = (char ',' >> cells) -- Found comma? More cells coming <|> (return []) -- No comma? Return [], no more cells -- Each cell contains 0 or more characters, which must not be a comma or -- EOL cellContent :: GenParser Char st String cellContent = many (noneOf ",\n") -- The end of line character is \n eol :: GenParser Char st Char eol = char '\n' parseCSV :: String -> Either ParseError [[String]] parseCSV input = parse csvFile "(unknown)" input
Before looking at all this code in detail, let's play with it a bit and see what it does.
ghci> :l csv1.hs [1 of 1] Compiling Main ( csv1.hs, interpreted ) Ok, modules loaded: Main. ghci> parseCSV "" Loading package parsec-2.0 ... linking ... done. Right []ghci>
:l csv1.hs
[1 of 1] Compiling Main ( csv1.hs, interpreted ) Ok, modules loaded: Main.ghci>
parseCSV ""
Loading package parsec-2.0 ... linking ... done. Right []
That makes sense: parsing the empty string returns an empty list. Let's try parsing a single cell.
ghci> parseCSV "hi" Left "(unknown)" (line 1, column 3): unexpected end of input expecting "," or "\n"ghci>
parseCSV "hi"
Left "(unknown)" (line 1, column 3): unexpected end of input expecting "," or "\n"
Look at that. Recall how we defined that each line must end with the end-of-line character, and we didn't give it. Parsec's error message helpfully indicated the line number and column number of the problem, and even told us what it was expecting! Let's give it an end-of-line character and continue experimenting.
ghci> parseCSV "hi\n" Right [["hi"]] ghci> parseCSV "line1\nline2\nline3\n" Right [["line1"],["line2"],["line3"]] ghci> parseCSV "cell1,cell2,cell3\n" Right [["cell1","cell2","cell3"]] ghci> parseCSV "l1c1,l1c2\nl2c1,l2c2\n" Right [["l1c1","l1c2"],["l2c1","l2c2"]] ghci> parseCSV "Hi,\n\n,Hello\n" Right [["Hi",""],[""],["","Hello"]]ghci>
parseCSV "hi\n"
Right [["hi"]]ghci>
parseCSV "line1\nline2\nline3\n"
Right [["line1"],["line2"],["line3"]]ghci>
parseCSV "cell1,cell2,cell3\n"
Right [["cell1","cell2","cell3"]]ghci>
parseCSV "l1c1,l1c2\nl2c1,l2c2\n"
Right [["l1c1","l1c2"],["l2c1","l2c2"]]ghci>
parseCSV "Hi,\n\n,Hello\n"
Right [["Hi",""],[""],["","Hello"]]
You can see that parseCSV
is doing exactly what we
wanted it to do. It's even handling empty cells and empty lines
properly.
Let's take a look at the code for this example. We didn't use many shortcuts here, so remember that this will get shorter and simpler!
We've built it from the
top down,
so our first function is csvFile
. The type of this
function is GenParser Char st [[String]]
. This
means that the type of the input is a sequence of characters, which is
exactly what a Haskell string is, since String
is
the same as [Char]
. It also means that we will
return a value of type [[String]]
: a list of a list
of strings. The st
can be ignored for now.
Parsec programmers usually omit type declarations, since we write so
many small functions. Haskell's type inference can figure it out.
We've listed the types for the first example here so you can get a
better idea of what's going on. You can always use
:t
in ghci to inspect types as well.
The csvFile
uses a do
block. Parsec is a monadic
parser combinator, but we are not using the IO
monad here.
As we discussed in fixme: insert ref, not all monads
are IO
monads.
We start by running many line
.
many
is a function that takes a function as an
argument. It tries to repeatedly
parse the input using the function passed to it. It gathers up the
results from all that repeated parsing and returns a list of them. So,
here, we are storing the results of parsing all lines in
result
. Then we look for the end-of-file indicator,
called eof
. Finally, we return the
result
. So, a CSV file is made of of many lines,
then the end of file. You can often read out Parsec functions in plain
English just like that.
Now we must answer the question: what is a line? We define the
line
function to do just that. Reading the
function, we can see that a line consists of cells followed by the end
of line character.
So what are cells? We defined them in the cells
function. The cells of a line start with the content of the first
cell, then continue with the content of the remaining cells, if any.
The result is simply the first cell and the remaining cells assembled
into a list.
Let's skip over remainingCells
for a minute and
look at cellContent
. A cell contains any number of
characters, but each character must not be a comma or end of line
character. The noneOf
function matches one item, so
long as it isn't in the list of items that we pass. So, saying
many (noneOf ",\n")
defines a cell the way we want
it.
Back in remainingCells
, we have the first example of
a choice in Parsec. The choice operator is <|>
. This operator
behaves like this: it will first try the parser on the left. If it
consumed no input, it will try the parser on the right.
So, in remainingCells
, our task is to come up with
all the cells after the first. Recall that
cellContent
uses noneOf ",\n"
.
So it will not consume the comma or end-of-line character from the
input. If we see a comma after parsing a cell, it means that at least
one more cell follows. Otherwise, we're done. So, our first choice in
remainingCells
is char ','
. This
parser simply matches the passed character in the input. If we found a
comma, we want this function to return the remaining cells on the line.
At this point, the "remaining cells" looks exactly like the start of
the line, so we call cells
recursively to parse
them. If we didn't find a comma, we return the empty list, signifying
no remaining cells on the line.
Finally, we must define what the end-of-line indicator is. We set it
to char '\n'
, which will suit our purposes fine for
now.
At the very end of the program, we define a function
parseCSV
that takes a String
and parses it as a
CSV file. This function is just a shortcut that calls Parsec's
parse
function, filling in a few parameters.
parse
returns Either ParseError
[[String]]
for the CSV file. If there was an error, the
return vaule will be Left
with the error; otherwise, it will be
Right
with the result.
We promised you earlier that we could simply our CSV parser
significantly by using a few Parsec helper functions. There are two
that will dramatically simplify this code. The first is
endBy
, which takes two functions. It applies the
first function, notes its result, then applies the second. It will do
this over and over, and return a list of the results from the first
function.
The second tool for us is sepBy
. It's like
endBy
, but expects the very last item to not end
with the separator.
So, we can use endBy
to parse lines, since every
line must end with the end-of-line character. We can use
sepBy
to parse cells, since the last cell will not
end with a comma. Take a look at how much simpler our parser is now:
import Text.ParserCombinators.Parsec csvFile = endBy line eol line = sepBy cell (char ',') cell = many (noneOf ",\n") eol = char '\n' parseCSV :: String -> Either ParseError [[String]] parseCSV input = parse csvFile "(unknown)" inputimport Text.ParserCombinators.Parsec csvFile = endBy line eol line = sepBy cell (char ',') cell = many (noneOf ",\n") eol = char '\n' parseCSV :: String -> Either ParseError [[String]] parseCSV input = parse csvFile "(unknown)" input
This program behaves exactly the same as the first one. You can verify that by using ghci to re-run our examples from the earlier example. You'll get the same result from every one. Yet the program is much shorter and more readable. It won't be long before you can translate Parsec code like this into a file format definition in plain English. As you read over this code, you can see that:
Different operating systems use different characters to mark the
end-of-line. Unix/Linux systems, plus Windows in text mode, use
simply "\n"
.
FIXME: verify this on Windows
DOS and Windows systems use "\n\r"
, and Macs
traditionally use "\r\n"
. We could also support
a bare "\r"
in case anybody uses that.
We could easily adapt our example to be able to handle all these types
of line endings in a single file. We would need to make two
modifications: adjust eol
to recognize the different
endings, and adjust the noneOf
pattern in
cell
to ignore \r
.
This must be done carefully. Recall that our earlier definition of
eol
was simply char '\n'
. There
is a parser called string
that we can use to match
the multi-character patterns. Let's start by thinking of how we would
add support for \n\r
.
Our first attempt might look like this:
-- This function is not correct! eol = string "\n" <|> string "\n\r"-- This function is not correct! eol = string "\n" <|> string "\n\r"
This isn't quite right. Recall that the <|>
operator always
tries the left alternative first. Looking for the single character
\n
will match both types of line endings, so it will
look to the system that the following line begins with
\r
. Not what we want. Try it in ghci:
ghci> :m Text.ParserCombinators.Parsec ghci> let eol = string "\n" <|> string "\n\r" Loading package parsec-2.0 ... linking ... done. ghci> parse eol "" "\n" Right "\n" ghci> parse eol "" "\n\r" Right "\n"ghci>
:m Text.ParserCombinators.Parsec
ghci>
let eol = string "\n" <|> string "\n\r"
Loading package parsec-2.0 ... linking ... done.ghci>
parse eol "" "\n"
Right "\n"ghci>
parse eol "" "\n\r"
Right "\n"
It may look like it worked for both endings, but actually looking at it this way, we can't tell. If it left something un-parsed, we don't know, because we're not looking for anything else. So let's look for the end-of-file after our end of line:
ghci> parse (eol >> eof) "" "\n\r" Left (line 2, column 1): unexpected "\r" expecting end of input ghci> parse (eol >> eof) "" "\n" Right ()ghci>
parse (eol >> eof) "" "\n\r"
Left (line 2, column 1): unexpected "\r" expecting end of inputghci>
parse (eol >> eof) "" "\n"
Right ()
As expected, we got an error from the \n\r
ending.
So the next temptation may
be to try it this way:
-- This function is not correct! eol = string "\n\r" <|> string "\n"-- This function is not correct! eol = string "\n\r" <|> string "\n"
This also isn't right. Recall that <|>
only attempts the option
on the right of the option on the left consumed no input. But by the
time we are able to see if there is a \r
after the
\n
, we've already consumed the
\n
. This time, we fail on the other case in ghci:
ghci> :m Text.ParserCombinators.Parsec ghci> let eol = string "\n\r" <|> string "\n" Loading package parsec-2.0 ... linking ... done. ghci> parse (eol >> eof) "" "\n\r" Right () ghci> parse (eol >> eof) "" "\n" Left (line 1, column 1): unexpected end of input expecting "\n\r"ghci>
:m Text.ParserCombinators.Parsec
ghci>
let eol = string "\n\r" <|> string "\n"
Loading package parsec-2.0 ... linking ... done.ghci>
parse (eol >> eof) "" "\n\r"
Right ()ghci>
parse (eol >> eof) "" "\n"
Left (line 1, column 1): unexpected end of input expecting "\n\r"
We've stumbled upon the lookahead problem. It turns out that, when
writing parsers, it's often very convenient to be able to "look ahead"
at the data that's coming in. Parsec supports this, but before showing
you how to use it, let's see how you would have to write this to get
along without it. You'd have to manually expand all the options after
the \n
like this:
eol = do char '\n' char '\r' <|> return '\n'eol = do char '\n' char '\r' <|> return '\n'
This function first looks for \n
. If it is found,
then it will look for \r
, consuming it if possible.
Since the return type of char '\r'
is a
Char
, the alternative action is to simply return a
Char
without attempting to parse anything. Parsec
has a function option
that can also express this
idiom as option '\n' (char '\r')
. Let's test this
with ghci.
ghci> :l csv5.hs [1 of 1] Compiling Main ( csv5.hs, interpreted ) Ok, modules loaded: Main. ghci> parse (eol >> eof) "" "\n\r" Loading package parsec-2.0 ... linking ... done. Right () ghci> parse (eol >> eof) "" "\n" Right ()ghci>
:l csv5.hs
[1 of 1] Compiling Main ( csv5.hs, interpreted ) Ok, modules loaded: Main.ghci>
parse (eol >> eof) "" "\n\r"
Loading package parsec-2.0 ... linking ... done. Right ()ghci>
parse (eol >> eof) "" "\n"
Right ()
This time, we got the right result! But we could have done it easier with Parsec's lookahead support.
Parsec has a function called try
that is used to express
lookaheads. try
takes one function, a parser. It applies that
parser. If the parser doesn't succeed, try
behaves as if it hadn't
consumed any input at all. So, when you use try
on the left side
of <|>
, Parsec will try the option on the right even if the
lift side failed after consuming some input. try
only has an
effect if it is on the left of a <|>
. Keep in mind, though,
that many functions use
<|>
interally. Sometimes you may want
to add it to your code just in case it will be used on the left of a
<|>
, and that works fine. Here's a way to add
expanded end-of-line support to our CSV parser using try
:
import Text.ParserCombinators.Parsec csvFile = endBy line eol line = sepBy cell (char ',') cell = many (noneOf ",\n\r") eol = try (string "\n\r") <|> try (string "\r\n") <|> string "\n" <|> string "\r" parseCSV :: String -> Either ParseError [[String]] parseCSV input = parse csvFile "(unknown)" inputimport Text.ParserCombinators.Parsec csvFile = endBy line eol line = sepBy cell (char ',') cell = many (noneOf ",\n\r") eol = try (string "\n\r") <|> try (string "\r\n") <|> string "\n" <|> string "\r" parseCSV :: String -> Either ParseError [[String]] parseCSV input = parse csvFile "(unknown)" input
Here we put both of the two-character endings first, and run both
tests under try
. Both of them occur to the left of a <|>
, so
they will do the right thing. We could have put string
"\n"
within a try
, but it wouldn't have altered any
behavior since they look at only one character anyway. We can load
this up and test the eol
function in ghci.
ghci> :l csv6.hs [1 of 1] Compiling Main ( csv6.hs, interpreted ) Ok, modules loaded: Main. ghci> parse (eol >> eof) "" "\n\r" Loading package parsec-2.0 ... linking ... done. Right () ghci> parse (eol >> eof) "" "\n" Right () ghci> parse (eol >> eof) "" "\r\n" Right () ghci> parse (eol >> eof) "" "\r" Right ()ghci>
:l csv6.hs
[1 of 1] Compiling Main ( csv6.hs, interpreted ) Ok, modules loaded: Main.ghci>
parse (eol >> eof) "" "\n\r"
Loading package parsec-2.0 ... linking ... done. Right ()ghci>
parse (eol >> eof) "" "\n"
Right ()ghci>
parse (eol >> eof) "" "\r\n"
Right ()ghci>
parse (eol >> eof) "" "\r"
Right ()
All four endings were handled properly. You can also test the full CSV parser with some different endings like this:
ghci> parseCSV "line1\r\nline2\nline3\n\rline4\rline5\n" Right [["line1"],["line2"],["line3"],["line4"],["line5"]]ghci>
parseCSV "line1\r\nline2\nline3\n\rline4\rline5\n"
Right [["line1"],["line2"],["line3"],["line4"],["line5"]]
As you can see, this program even supports different line endings within a single file.
At the beginning of this chapter, you saw how Parsec could generate error messages that list the location where the error occured as well as what was expected. As parsers get more complex, the list of what was expected can become cumbersome. Parsec provides a way for you to specify custom error messages in the event of parse failures.
Let's look at what happens on an error when our current CSV parser:
ghci> parseCSV "line1" Left "(unknown)" (line 1, column 6): unexpected end of input expecting ",", "\n\r", "\r\n", "\n" or "\r"ghci>
parseCSV "line1"
Left "(unknown)" (line 1, column 6): unexpected end of input expecting ",", "\n\r", "\r\n", "\n" or "\r"
That's a pretty long, and technical, error message. We could make an
attempt to resolve this by using the monad fail
function like so:
eol = try (string "\n\r") <|> try (string "\r\n") <|> string "\n" <|> string "\r" <|> fail "Couldn't find EOL"eol = try (string "\n\r") <|> try (string "\r\n") <|> string "\n" <|> string "\r" <|> fail "Couldn't find EOL"
Under ghci, we can see the result:
ghci> :l csv7.hs [1 of 1] Compiling Main ( csv7.hs, interpreted ) Ok, modules loaded: Main. ghci> parseCSV "line1" Loading package parsec-2.0 ... linking ... done. Left "(unknown)" (line 1, column 6): unexpected end of input expecting ",", "\n\r", "\r\n", "\n" or "\r" Couldn't find EOLghci>
:l csv7.hs
[1 of 1] Compiling Main ( csv7.hs, interpreted ) Ok, modules loaded: Main.ghci>
parseCSV "line1"
Loading package parsec-2.0 ... linking ... done. Left "(unknown)" (line 1, column 6): unexpected end of input expecting ",", "\n\r", "\r\n", "\n" or "\r" Couldn't find EOL
We added to the error result, but didn't really help clean up the
output. Parsec has an <?>
operator that is designed for just these
situations. It is similar to <|>
in that it first tries the
parser on its left. Instead of trying another parser in the event of
a failure, it presents an error message. Here's how we'd use it:
eol = try (string "\n\r") <|> try (string "\r\n") <|> string "\n" <|> string "\r" <?> "end of line"eol = try (string "\n\r") <|> try (string "\r\n") <|> string "\n" <|> string "\r" <?> "end of line"
Now, when you generate an error, you'll get more helpful output:
ghci> :l csv8.hs [1 of 1] Compiling Main ( csv8.hs, interpreted ) Ok, modules loaded: Main. ghci> parseCSV "line1" Loading package parsec-2.0 ... linking ... done. Left "(unknown)" (line 1, column 6): unexpected end of input expecting "," or end of lineghci>
:l csv8.hs
[1 of 1] Compiling Main ( csv8.hs, interpreted ) Ok, modules loaded: Main.ghci>
parseCSV "line1"
Loading package parsec-2.0 ... linking ... done. Left "(unknown)" (line 1, column 6): unexpected end of input expecting "," or end of line
That's pretty helpful! The general rule of thumb is that you put a
human description of what you're looking for to the right of
<?>
.
Our earlier CSV examples have had an important flaw: they weren't able to handle cells that contain a comma. CSV generating programs typically put quotation marks around such data. But then you have another problem: what to do if a cell contains a quotation mark and a comma. In these cases, the embedded quotation marks are doubled up.
Here is a full CSV parser. You can use this from ghci, or if you compile it to a standalone program, it will parse a CSV file on standard input and convert it a different format on output.
import Text.ParserCombinators.Parsec csvFile = endBy line eol line = sepBy cell (char ',') cell = quotedCell <|> many (noneOf ",\n\r") quotedCell = do char '"' content <- many quotedChar char '"' <?> "quote at end of cell" return content quotedChar = noneOf "\"" <|> try (string "\"\"" >> return '"') eol = try (string "\n\r") <|> try (string "\r\n") <|> string "\n" <|> string "\r" <?> "end of line" parseCSV :: String -> Either ParseError [[String]] parseCSV input = parse csvFile "(unknown)" input main = do c <- getContents case parse csvFile "(stdin)" c of Left e -> do putStrLn "Error parsing input:" print e Right r -> mapM_ print rimport Text.ParserCombinators.Parsec csvFile = endBy line eol line = sepBy cell (char ',') cell = quotedCell <|> many (noneOf ",\n\r") quotedCell = do char '"' content <- many quotedChar char '"' <?> "quote at end of cell" return content quotedChar = noneOf "\"" <|> try (string "\"\"" >> return '"') eol = try (string "\n\r") <|> try (string "\r\n") <|> string "\n" <|> string "\r" <?> "end of line" parseCSV :: String -> Either ParseError [[String]] parseCSV input = parse csvFile "(unknown)" input main = do c <- getContents case parse csvFile "(stdin)" c of Left e -> do putStrLn "Error parsing input:" print e Right r -> mapM_ print r
That's a full-featured CSV parser in just 21 lines of code, plus an
additional 10 lines for the parseCSV
and
main
utility functions.
Let's look at the changes in this program from the previous versions.
First, a cell may now be either a bare cell or a "quoted" cell. We
give the quotedCell
option first, because we want to
follow that path if the first character in a cell is the quote mark.
The quotedCell
begins and ends with a quote mark,
and contains zero or more characters. These characters can't be copied
directly, though, because they may contain embedded, doubled-up, quote
marks themselves. So we define a custom quotedChar
to process them.
When we're processing characters inside a quoted cell, we first say
noneOf "\""
. This will match and return any single
character as long as it's not the quote mark. Otherwise, if it is the
quote mark, we see if we have two of them in a row. If so, we return a
single quote mark to go on our result string.
Notice that try
in quotedChar
on the
right side of <|>
. Recall that I said that
try
only has an effect if it is on the left side of <|>
. This
try
does occur on the left side of a <|>
, but on the left of
one that must be within the implementation of many
.
This try
is actually quite important. Let's say you are parsing a
quoted cell and are getting towards the end of it. There will be
another cell following. So you will expect to see a quote to end the
current cell, followed by a comma. When you hit
quotedChar
, you will fail the
noneOf
test and proceed to the test that looks for
two quotes in a row. You'll also fail that one because you'll have a
quote, then a comma. If you hadn't used try
, you'd crash with an
error at this point, saying that it was expecting the second quote,
because the first quote was already consumed. Since we use try
, this
is properly recognized as not a character that's part of the cell, so
it terminates the many quotedChar
expression as
expected. Lookahead has once again proven very useful, and the fact
that it is so easy to add makes it a remarkable tool in Parsec.
You can test this program with ghci over some quoted cells:
ghci> :l csv9.hs [1 of 1] Compiling Main ( csv9.hs, interpreted ) Ok, modules loaded: Main. ghci> parseCSV "\"This, is, one, big, cell\"\n" Loading package parsec-2.0 ... linking ... done. Right [["This, is, one, big, cell"]] ghci> parseCSV "\"Cell without an end\n" Left "(unknown)" (line 2, column 1): unexpected end of input expecting "\"\"" or quote at end of cellghci>
:l csv9.hs
[1 of 1] Compiling Main ( csv9.hs, interpreted ) Ok, modules loaded: Main.ghci>
parseCSV "\"This, is, one, big, cell\"\n"
Loading package parsec-2.0 ... linking ... done. Right [["This, is, one, big, cell"]]ghci>
parseCSV "\"Cell without an end\n"
Left "(unknown)" (line 2, column 1): unexpected end of input expecting "\"\"" or quote at end of cell
Let's run it over a real CSV file. Here's a CSV file generated by a spreadsheet program:
"Product","Price" "O'Reilly Socks",10 "Shirt with ""Haskell"" text",20 "Shirt, ""O'Reilly"" version",20 "Haskell Caps",15 "Product","Price" "O'Reilly Socks",10 "Shirt with ""Haskell"" text",20 "Shirt, ""O'Reilly"" version",20 "Haskell Caps",15
Now, we can run this under our test program and watch:
$ runhaskell csv9.hs < test.csv
["Product","Price"]
["O'Reilly Socks","10"]
["Shirt with \"Haskell\" text","20"]
["Shirt, \"O'Reilly\" version","20"]
["Haskell Caps","15"]
$ runhaskell csv9.hs < test.csv
["Product","Price"]
["O'Reilly Socks","10"]
["Shirt with \"Haskell\" text","20"]
["Shirt, \"O'Reilly\" version","20"]
["Haskell Caps","15"]