**Table of Contents**

- Thinking in Haskell
- Warming up: portably splitting lines of text
- Working with lists
- How to think about loops
- Explicit recursion
- Transforming every piece of input
- Mapping over a list
- Selecting pieces of input
- Computing one answer over a collection
- The left fold
- Why use folds, maps, and filters?
- Avoiding multiple traversals of a list
- Folding from the right and primitive recursion
- A final note about foldl
- Exercises
- Further reading

- Example: Run-Length Encoding
- Type Inference
- Anonymous (lambda) functions
- Partial function application and currying
- As-patterns and function composition

There are two aspects to the early stages of programming effectively in Haskell. The first is coming to terms with the shift in mindset from imperative programming to functional: we have to step away from an entire toolbox of programming habits from other languages. We do this not because these techniques are bad, but because in a functional language they're either not applicable or they turn out downright ungainly. At the same time, we have to come up to speed with a replacement vocabulary, so that we can hack productively.

Our second challenge is learning our way around the standard Haskell libraries. As in any language, the libraries act as a lever, enabling us to multiply our problem solving power. However, Haskell libraries tend to be organised around a higher level of abstraction than those in many other languages. We'll probably have to put more effort into learning them, but in exchange they offer a tantalisingly greater magnification of our efforts.

In this chapter, we'll introduce a number of common functional programming techniques. We'll draw upon examples from imperative languages to highlight the shift in thinking that we'll need to make. As we do so, we'll walk through some of the fundamentals of Haskell's standard libraries. We'll also intermittently cover a few more language features along the way.

Haskell provides a built-in function,
`lines`

, that lets us split a text string on
line boundaries. It returns a list of strings with line
termination characters omitted.

ghci> :type lines lines :: String -> [String] ghci> lines "line 1\nline 2" ["line 1","line 2"] ghci> lines "foo\n\nbar\n" ["foo","","bar"]`ghci>`

lines :: String -> [String]`:type lines`

`ghci>`

["line 1","line 2"]`lines "line 1\nline 2"`

`ghci>`

["foo","","bar"]`lines "foo\n\nbar\n"`

While `lines`

looks useful, it relies on
us reading a file in “text mode” in order to work
(yes, we'll be talking about opening files soon); it doesn't
deal well with Windows line ending conventions.

ghci> lines "a\r\nb" ["a\r","b"]`ghci>`

["a\r","b"]`lines "a\r\nb"`

The function only splits on newline characters, leaving carriage returns dangling at the ends of lines. Ugh. We can't rely on opening a file in text mode to do the right thing on our behalf. For example, if we're reading a Windows-generated text file on a Linux or Unix box, we'll get trailing carriage returns at the end of each line.

Years of comfortable hacking with Python's “universal
newline” support, which transparently handles Unix and
Windows line ending conventions for us, left us wanting
something similar in Haskell. Although Python conveniently
provides a built-in `splitlines`

string
method, we'll rewrite it as a Python function, just to see what
a reasonable Python implementation might look like.

def splitlines(s): ret = [] while True: head, sep, tail = s.partition('\r\n') if not (sep or tail): head, sep, tail = s.partition('\r') if not (sep or tail): head, sep, tail = s.partition('\n') if not (sep or tail): break ret.append(head) s = tail if s: ret.append(s) return retdef splitlines(s): ret = [] while True: head, sep, tail = s.partition('\r\n') if not (sep or tail): head, sep, tail = s.partition('\r') if not (sep or tail): head, sep, tail = s.partition('\n') if not (sep or tail): break ret.append(head) s = tail if s: ret.append(s) return ret

The key to understanding this code lies with Python's
`partition`

string method: it searches for a
substring. If it finds it, it returns a triple of the substring
before the match, the match, and the substring after the match.
Otherwise, it returns a triple of the entire string and two
empty strings.

Because we're still early in our career of reading Haskell code, we'll discuss our Haskell implementation in quite some detail. This is almost the last instance where we'll walk through “basic” Haskell code at this level.

Haskell provides a similar function named
`break`

that we can use to the same end.
Unlike Python's `partition`

, it operates over
lists of any type, instead of strings. It does this by taking a
function as its first parameter: this function takes an element
of the list, and returns a Bool to indicate whether
to break the list there or not. What `break`

returns is a two-tuple, of the sublist before the predicate
returns `True`

(the prefix), and the rest of the list
(the suffix).

ghci> break odd [2,4,5,6,8] ([2,4],[5,6,8]) ghci> :module +Data.Char ghci> break isUpper "isUpper" ("is","Upper")`ghci>`

([2,4],[5,6,8])`break odd [2,4,5,6,8]`

`ghci>`

`:module +Data.Char`

`ghci>`

("is","Upper")`break isUpper "isUpper"`

This makes `break`

both more and less
powerful than `partition`

: we can use it on
any type of list, not just a string; but it can only examine a
single item at a time, not a sublist.

For our purposes, examining one character at a time will work perfectly well, since we only need to match a single carriage return or newline.

splitLines :: String -> [String] splitLines [] = [] splitLines cs = case break isLineSeparator cs of (pre, '\r':'\n':suf) -> pre : splitLines suf (pre, '\r':suf) -> pre : splitLines suf (pre, '\n':suf) -> pre : splitLines suf (pre, "") -> [pre] isLineSeparator :: Char -> Bool isLineSeparator c = c `elem` "\r\n"splitLines :: String -> [String] splitLines [] = [] splitLines cs = case break isLineSeparator cs of (pre, '\r':'\n':suf) -> pre : splitLines suf (pre, '\r':suf) -> pre : splitLines suf (pre, '\n':suf) -> pre : splitLines suf (pre, "") -> [pre] isLineSeparator :: Char -> Bool isLineSeparator c = c `elem` "\r\n"

Our function first calls `break`

on its
input string, to break it into the substring before a line
terminator, and the rest of the string (which will include the
line terminator, if any).

It then uses the pattern matching capabilities of
`case`

to inspect the return value of
`break`

. We know that
`break`

always returns a two-tuple, so each
pattern on the left matches a two-tuple. The first element of
the tuple pattern doesn't inspect its value; it just binds the
variable `pre`

to whatever value is in that
element of the tuple. The second element of the pattern
*does* inspect its value. The first pattern
matches any string containing at least two characters that
begins with a carriage return, followed by a newline, then binds
`suf`

to the remainder of the string. The
other patterns should be self-explanatory.

The right-hand side of each of the first three branches of
the `case`

expression constructs a new list, using
the prefix string `pre`

as the head, and as the
remainder, the list of strings resulting from applying
`splitLines`

to the suffix string
`suf`

without the leading line separator
characters.

Following a prose description of the behaviour of a Haskell
function isn't easy. We can get a better understanding by
stepping into **ghci**, and watching the function at work in
different circumstances.

We'll start by looking at a string that doesn't contain any line separators.

ghci> splitLines "foo" ["foo"]`ghci>`

["foo"]`splitLines "foo"`

Here, our call to `break`

never finds a
line separator, so we get an empty suffix.

ghci> break isLineSeparator "foo" ("foo","")`ghci>`

("foo","")`break isLineSeparator "foo"`

The `case`

expression thus hits a match on the
fourth branch, and we're done. What about a slightly more
interesting case?

ghci> splitLines "foo\r\nbar" ["foo","bar"]`ghci>`

["foo","bar"]`splitLines "foo\r\nbar"`

Our first call to `break`

gives us a
non-empty suffix.

ghci> break isLineSeparator "foo\r\nbar" ("foo","\r\nbar")`ghci>`

("foo","\r\nbar")`break isLineSeparator "foo\r\nbar"`

Because the suffix begins with a carriage return, followed
by a newline, we match on the first branch of the
`case`

expression. This gives us
`pre`

bound to `"foo"`

, and
`suf`

bound to `"bar"`

. We call
`splitLines`

again, this time on
`"bar"`

alone.

ghci> splitLines "bar" ["bar"]`ghci>`

["bar"]`splitLines "bar"`

The result is that we construct a list whose head is
`"foo"`

and whose tail is
`["bar"]`

.

ghci> "foo" : ["bar"] ["foo","bar"]`ghci>`

["foo","bar"]`"foo" : ["bar"]`

As the bread and butter of functional programming, lists deserve some serious attention. The standard prelude defines dozens of functions for dealing with lists. Many of these will be indispensable tools, so it's important that we learn them early on.

The `Data.List`

module is the “real”
logical home of all standard list functions. The prelude merely
re-exports a large subset of the functions exported by
`Data.List`

. Several invaluable functions in
`Data.List`

are *not* re-exported
by the standard prelude. As we walk through list functions in
the sections that follow, we'll explicitly mention those that
are in `Data.List`

.

ghci> :module +Data.List`ghci>`

`:module +Data.List`

Because none of these functions is complex or takes more than about three lines of Haskell to write, we'll be brief in our descriptions of each. In fact, a quick and useful learning exercise is to write a definition of each function after you've read about it.

The simplest function on a list is
`null`

, which merely tells us whether or
not the list is empty.

ghci> :type null null :: [a] -> Bool ghci> null [] True ghci> null "plugh" False`ghci>`

null :: [a] -> Bool`:type null`

`ghci>`

True`null []`

`ghci>`

False`null "plugh"`

The `length`

function tells us how many
elements are in a list.

ghci> :type length length :: [a] -> Int ghci> length [] 0 ghci> length [1,2,3] 3`ghci>`

length :: [a] -> Int`:type length`

`ghci>`

0`length []`

`ghci>`

3`length [1,2,3]`

To get the first element of a list, we use the
`head`

function.

ghci> :type head head :: [a] -> a ghci> head [1,2,3] 1`ghci>`

head :: [a] -> a`:type head`

`ghci>`

1`head [1,2,3]`

The converse, `tail`

, returns all
*but* the head of a list.

ghci> :type tail tail :: [a] -> [a] ghci> tail "foo" "oo"`ghci>`

tail :: [a] -> [a]`:type tail`

`ghci>`

"oo"`tail "foo"`

Another function, `last`

, returns the
very last element of a list.

ghci> :type last last :: [a] -> a ghci> last "bar" 'r'`ghci>`

last :: [a] -> a`:type last`

`ghci>`

'r'`last "bar"`

The converse of `last`

is
`init`

, which returns a list of all but the
last element of its input.

ghci> :type init init :: [a] -> [a] ghci> init "bar" "ba"`ghci>`

init :: [a] -> [a]`:type init`

`ghci>`

"ba"`init "bar"`

None of the above functions is well-behaved on empty lists, so be careful if you don't know whether or not a list is empty. What form does their misbehaviour take?

ghci> head [] *** Exception: Prelude.head: empty list`ghci>`

*** Exception: Prelude.head: empty list`head []`

When we want to use a function like
`safe`

, where we know that it might blow up
on us if we pass in an empty list, the temptation might
initially be strong to check the length of the list before we
call `safe`

. Let's construct a hideously
artificial example to illustrate our point.

myDumbExample xs = if length xs > 0 then head xs else 'Z'myDumbExample xs = if length xs > 0 then head xs else 'Z'

If we're coming from a language like Perl or python, this
might seem like a perfectly natural way to write this test.
Behind the scenes, Python lists are arrays; and Perl arrays
are, well, arrays. So they necessarily know how long they
are, and calling `len(foo)`

or `$#foo+1`

is a perfectly natural thing to do. But as with many other
things, it's not a good idea to blindly transplant such an
assumption into Haskell.

We've already seen the definition of the list algebraic
data type in the section called “A little more about lists”, and know
that a list doesn't encode its own length. Thus, the only way
that `length`

can operate is to walk the
entire list.

Therefore, when we only care whether or not a list is
empty, calling `length`

isn't a good
strategy. It can potentially do a lot more work than we want,
if the list we're working is finite. Worse, Haskell lets us
define infinitely long lists, on which an unsuspecting call to
`length`

will never return!

A more appropriate function to call here instead is
`null`

, which runs in constant time. Better
yet, using `null`

makes our code indicate
what property of the list we really care about.

Functions that only have return values defined for a
subset of valid inputs are called *partial*
functions (calling `error`

doesn't qualify
as returning a value!). We call functions that return valid
results over their entire input domains
*total* functions.

It's always a good idea to know whether a function you're using is partial or total. Calling a partial function with an input that it can't handle is probably the single biggest source of straightforward, avoidable bugs in Haskell programs.

Some Haskell programmers go so far as to give partial
functions names that begin with a prefix such as
`unsafe`

, so that they can't shoot themselves in
the foot accidentally.

It's arguably a deficiciency of the standard prelude that
it defines quite a few “unsafe” partial
functions, like `head`

, without also
providing “safe” total equivalents.

Haskell's name for the “append” function is
`(++)`

.

ghci> :type (++) (++) :: [a] -> [a] -> [a] ghci> "foo" ++ "bar" "foobar" ghci> [] ++ [1,2,3] [1,2,3] ghci> [True] ++ [] [True]`ghci>`

(++) :: [a] -> [a] -> [a]`:type (++)`

`ghci>`

"foobar"`"foo" ++ "bar"`

`ghci>`

[1,2,3]`[] ++ [1,2,3]`

`ghci>`

[True]`[True] ++ []`

The `concat`

function takes a list of
lists, all of the same type, and “flattens” them
into a single list.

ghci> :type concat concat :: [[a]] -> [a] ghci> concat [[1,2,3], [4,5,6]] [1,2,3,4,5,6]`ghci>`

concat :: [[a]] -> [a]`:type concat`

`ghci>`

[1,2,3,4,5,6]`concat [[1,2,3], [4,5,6]]`

It only flattens one level of nesting.

ghci> concat [[[1,2],[3]], [[4],[5],[6]]] [[1,2],[3],[4],[5],[6]] ghci> concat (concat [[[1,2],[3]], [[4],[5],[6]]]) [1,2,3,4,5,6]`ghci>`

[[1,2],[3],[4],[5],[6]]`concat [[[1,2],[3]], [[4],[5],[6]]]`

`ghci>`

[1,2,3,4,5,6]`concat (concat [[[1,2],[3]], [[4],[5],[6]]])`

The `reverse`

function returns the
elements of a list in reverse order.

ghci> :type reverse reverse :: [a] -> [a] ghci> reverse "foo" "oof"`ghci>`

reverse :: [a] -> [a]`:type reverse`

`ghci>`

"oof"`reverse "foo"`

For lists of Bool, the
`and`

and `or`

functions
generalise their two-argument
cousins,`(&&)`

and
`(||)`

.

ghci> :type and and :: [Bool] -> Bool ghci> and [True,False,True] False ghci> and [] True ghci> :type or or :: [Bool] -> Bool ghci> or [False,False,False,True,False] True ghci> or [] False`ghci>`

and :: [Bool] -> Bool`:type and`

`ghci>`

False`and [True,False,True]`

`ghci>`

True`and []`

`ghci>`

or :: [Bool] -> Bool`:type or`

`ghci>`

True`or [False,False,False,True,False]`

`ghci>`

False`or []`

They have more useful cousins, `all`

and `any`

, which operate on lists of any
type. Each one takes a predicate as its first argument;
`all`

returns `True`

if that
predicate succeeds on every element of the list, while
`any`

returns `True`

if the
predicate succeeds on any element of the list.

ghci> :type all all :: (a -> Bool) -> [a] -> Bool ghci> all odd [1,3,5] True ghci> all odd [3,1,4,1,5,9,2,6,5] False ghci> all odd [] True ghci> :type any any :: (a -> Bool) -> [a] -> Bool ghci> any even [3,1,4,1,5,9,2,6,5] True ghci> any even [] False`ghci>`

all :: (a -> Bool) -> [a] -> Bool`:type all`

`ghci>`

True`all odd [1,3,5]`

`ghci>`

False`all odd [3,1,4,1,5,9,2,6,5]`

`ghci>`

True`all odd []`

`ghci>`

any :: (a -> Bool) -> [a] -> Bool`:type any`

`ghci>`

True`any even [3,1,4,1,5,9,2,6,5]`

`ghci>`

False`any even []`

The `take`

function, which we already
met in the section called “Calling functions”, returns a
sublist consisting of the first several elements from a list.
Its converse, `drop`

, drops several
elements from the head of the list.

ghci> :type take take :: Int -> [a] -> [a] ghci> take 3 "foobar" "foo" ghci> take 2 [1] [1] ghci> :type drop drop :: Int -> [a] -> [a] ghci> drop 3 "xyzzy" "zy" ghci> drop 1 [] []`ghci>`

take :: Int -> [a] -> [a]`:type take`

`ghci>`

"foo"`take 3 "foobar"`

`ghci>`

[1]`take 2 [1]`

`ghci>`

drop :: Int -> [a] -> [a]`:type drop`

`ghci>`

"zy"`drop 3 "xyzzy"`

`ghci>`

[]`drop 1 []`

The `splitAt`

function combines the
functions of `take`

and
`drop`

, returning a two-tuple of the input
list, split at the given index.

ghci> :type splitAt splitAt :: Int -> [a] -> ([a], [a]) ghci> splitAt 3 "foobar" ("foo","bar")`ghci>`

splitAt :: Int -> [a] -> ([a], [a])`:type splitAt`

`ghci>`

("foo","bar")`splitAt 3 "foobar"`

The `takeWhile`

and
`dropWhile`

functions take predicates:
`takeWhile`

constructs a list as long as
the predicate returns `True`

, while
`dropWhile`

drops elements from the list as
long as the predicate returns `True`

.

ghci> :type takeWhile takeWhile :: (a -> Bool) -> [a] -> [a] ghci> takeWhile odd [1,3,5,6,8] [1,3,5] ghci> :type dropWhile dropWhile :: (a -> Bool) -> [a] -> [a] ghci> dropWhile even [2,4,6,7,9] [7,9]`ghci>`

takeWhile :: (a -> Bool) -> [a] -> [a]`:type takeWhile`

`ghci>`

[1,3,5]`takeWhile odd [1,3,5,6,8]`

`ghci>`

dropWhile :: (a -> Bool) -> [a] -> [a]`:type dropWhile`

`ghci>`

[7,9]`dropWhile even [2,4,6,7,9]`

Just as `splitAt`

“tuples
up” the results of `take`

and
`drop`

, the functions
`break`

(which we already saw in the section called “Warming up: portably splitting lines of text”) and `span`

tuple up the results of `takeWhile`

and
`dropWhile`

.

Each function takes a predicate;
`break`

consumes its input while its
predicate fails, while `span`

consumes
until its predicate succeeds.

ghci> :type break break :: (a -> Bool) -> [a] -> ([a], [a]) ghci> break even [1,3,5,6,8] ([1,3,5],[6,8]) ghci> :type span span :: (a -> Bool) -> [a] -> ([a], [a]) ghci> span even [2,4,6,7,9] ([2,4,6],[7,9])`ghci>`

break :: (a -> Bool) -> [a] -> ([a], [a])`:type break`

`ghci>`

([1,3,5],[6,8])`break even [1,3,5,6,8]`

`ghci>`

span :: (a -> Bool) -> [a] -> ([a], [a])`:type span`

`ghci>`

([2,4,6],[7,9])`span even [2,4,6,7,9]`

The `elem`

function indicates whether a
value is present in a list. It has a companion function,
`notElem`

.

ghci> :type elem elem :: (Eq a) => a -> [a] -> Bool ghci> 2 `elem` [5,3,2,1,1] True ghci> 2 `notElem` [5,3,2,1,1] False`ghci>`

elem :: (Eq a) => a -> [a] -> Bool`:type elem`

`ghci>`

True`2 `elem` [5,3,2,1,1]`

`ghci>`

False`2 `notElem` [5,3,2,1,1]`

For a more general search, `filter`

takes a predicate, and returns every element of the list on
which the predicate succeeds.

ghci> :type filter filter :: (a -> Bool) -> [a] -> [a] ghci> filter odd [2,4,1,3,6,8,5,7] [1,3,5,7]`ghci>`

filter :: (a -> Bool) -> [a] -> [a]`:type filter`

`ghci>`

[1,3,5,7]`filter odd [2,4,1,3,6,8,5,7]`

In `Data.List`

, three predicates,
`isPrefixOf`

,
`isInfixOf`

, and
`isSuffixOf`

, let us test for the presence
of sublists within a bigger list. The easiest way to use them
is as infix functions, where they read quite naturally.

The `isPrefixOf`

function tells us
whether its left argument matches the beginning of its right
argument.

ghci> :module +Data.List ghci> :type isPrefixOf isPrefixOf :: (Eq a) => [a] -> [a] -> Bool ghci> "foo" `isPrefixOf` "foobar" True ghci> [1,2] `isPrefixOf` [] False`ghci>`

`:module +Data.List`

`ghci>`

isPrefixOf :: (Eq a) => [a] -> [a] -> Bool`:type isPrefixOf`

`ghci>`

True`"foo" `isPrefixOf` "foobar"`

`ghci>`

False`[1,2] `isPrefixOf` []`

The `isInfixOf`

function indicates
whether its left argument is a sublist of its right.

ghci> :module +Data.List ghci> [2,6] `isInfixOf` [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9] True ghci> "funk" `isInfixOf` "sonic youth" False`ghci>`

`:module +Data.List`

`ghci>`

True`[2,6] `isInfixOf` [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9]`

`ghci>`

False`"funk" `isInfixOf` "sonic youth"`

The operation of `isSuffixOf`

shouldn't
need any explanation.

ghci> :module +Data.List ghci> ".c" `isSuffixOf` "crashme.c" True`ghci>`

`:module +Data.List`

`ghci>`

True`".c" `isSuffixOf` "crashme.c"`

The `zip`

function takes two lists and
“zips” them into a single list of pairs. The
resulting list is the same length as the shorter of the two
inputs.

ghci> :type zip zip :: [a] -> [b] -> [(a, b)] ghci> zip [12,72,93] "foo" [(12,'f'),(72,'o'),(93,'o')]`ghci>`

zip :: [a] -> [b] -> [(a, b)]`:type zip`

`ghci>`

[(12,'f'),(72,'o'),(93,'o')]`zip [12,72,93] "foo"`

More useful is `zipWith`

, which takes
two lists and applies a function to each pair of elements,
generating a list that is the same length as the shorter of
the two.

ghci> :type zipWith zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] ghci> zipWith (+) [1,2,3] [4,5,6] [5,7,9]`ghci>`

zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]`:type zipWith`

`ghci>`

[5,7,9]`zipWith (+) [1,2,3] [4,5,6]`

Haskell's type system makes it an interesting challenge to
write functions that take variable numbers of arguments. So
if we want to zip three lists together, we call
`zip3`

or `zipWith3`

,
and so on up to `zip7`

and
`zipWith7`

.

We've already encountered the standard
`lines`

function in the section called “Warming up: portably splitting lines of text”. It has a standard counterpart,
`unlines`

, which joins a list of lines
together using newline characters, and adds another newline to
the end of the string.

ghci> lines "foo\nbar" ["foo","bar"] ghci> unlines ["foo", "bar"] "foo\nbar\n"`ghci>`

["foo","bar"]`lines "foo\nbar"`

`ghci>`

"foo\nbar\n"`unlines ["foo", "bar"]`

The `words`

function splits an input
string on any whitespace. Its counterpart,
`unwords`

, uses a single space to join a
list of words.

ghci> words "the quick brown\n\n\nfox" ["the","quick","brown","fox"] ghci> unwords ["jumps", "over", "the", "lazy", "dog"] "jumps over the lazy dog"`ghci>`

["the","quick","brown","fox"]`words "the quick brown\n\n\nfox"`

`ghci>`

"jumps over the lazy dog"`unwords ["jumps", "over", "the", "lazy", "dog"]`

Unlike traditional languages, Haskell has neither a
`for`

loop nor a `while`

loop. If we've
got a lot of data to process, what do we use instead? There are
several possible answers to this question, so let's build up a
toolbox of answers.

A straightforward way to make the jump from a language that has loops to one that doesn't is to run through a few examples, looking at the differences. Here's a C function that takes a string of decimal digits and turns them into an integer.

int as_int(char *str) { int acc; for (acc = 0; *str != '\0'; str++) { acc = acc * 10 + *str - '0'; } return acc; }int as_int(char *str) { int acc; for (acc = 0; *str != '\0'; str++) { acc = acc * 10 + *str - '0'; } return acc; }

Given that Haskell doesn't have any looping constructs, how should we think about representing a fairly straightforward piece of code like this?

We don't have to start off by writing a type signature, but it helps to remind us of what we're working with.

import Data.Char (ord) asInt :: String -> Intimport Data.Char (ord) asInt :: String -> Int

The C code computes the result incrementally as it
traverses the string; the Haskell code can do the same.
However, in Haskell, we write the loop as a function, which
we'll call `loop`

just to keep things nice
and explicit.

loop :: Int -> String -> Int asInt xs = loop 0 xsloop :: Int -> String -> Int asInt xs = loop 0 xs

That first parameter to `loop`

is the
accumulator variable we'll be using. Passing zero into it is
equivalent to initialising the `acc`

variable
in C at the beginning of the loop.

Rather than leap into blazing code, let's think about the data we have to work with. Our familiar String is just a synonym for [Char], a list of characters. The easiest way for us to get the traversal right is to think about the structure of a list: it's either empty, or a single element followed by the rest of the list.

We can express this structural thinking directly by pattern matching on the list type's constructors. It's often handy to think about the easy cases first: here, that means we will consider the empty-list case.

loop acc [] = accloop acc [] = acc

An empty list doesn't just mean “the input string is empty”; it's also the case we'll encounter when we traverse all the way to the end of a non-empty list. So we don't want to “error out” if we see an empty list. Instead, we should do something sensible. Here, the sensible thing is to return our accumulated value.

The other case we have to consider arises when the input list is not empty. We need to do something with the current element of the list, and something with the rest of the list.

loop acc (x:xs) = let acc' = acc * 10 + ord x - ord '0' in loop acc' xsloop acc (x:xs) = let acc' = acc * 10 + ord x - ord '0' in loop acc' xs

We compute a new value for the accumulator, and give it
the name `acc'`

. We then call the
`loop`

function again, passing it the
updated value `acc'`

and the rest of the
input list; this is equivalent to the loop starting another
round in C.

Each time the `loop`

function calls
itself, it has a new value for the accumulator, and it
consumes one element of the input list. Eventually, it's
going to hit the end of the list, at which time the
`[]`

pattern will match, and the recursive calls
will cease.

How well does this function work? For positive integers, it's perfectly cromulent.

ghci> asInt "33" 33`ghci>`

33`asInt "33"`

But because we were focusing on how to traverse lists, not error handling, our poor function misbehaves if we try to feed it nonsense.

ghci> asInt "" 0 ghci> asInt "potato" 7103643`ghci>`

0`asInt ""`

`ghci>`

7103643`asInt "potato"`

We'll defer fixing our function's shortcomings to Q: 1.

Because the last thing that `loop`

does
is simply call itself, it's an example of a tail recursive
function. There's another common idiom in this code, too.
Thinking about the structure of the list, and handling the
empty and non-empty cases separately, is a kind of approach
called structural recursion.

We call the non-recursive case (when the list is empty) the base case. We'll see people refer to the case where the function calls itself as the recursive case (surprise!), or they might give a nod to mathematical induction and call it the inductive case.

Structural induction isn't confined to lists; we can use it on other algebraic data types, too. We'll have more to say about it later.

Consider another C function, `square`

,
which squares every element in an array.

void square(double *out, const double *in, size_t length) { for (size_t i = 0; i < length; i++) { out[i] = in[i] * in[i]; } }void square(double *out, const double *in, size_t length) { for (size_t i = 0; i < length; i++) { out[i] = in[i] * in[i]; } }

This contains a straightforward and common kind of loop, one that does exactly the same thing to every element of its input array. How might we write this loop in Haskell?

square :: [Double] -> [Double] square (x:xs) = x**2 : square xs square [] = []square :: [Double] -> [Double] square (x:xs) = x**2 : square xs square [] = []

Our `square`

function consists of two
pattern matching equations. The first
“deconstructs” the beginning of a non-empty list,
to get its head and tail. It squares the first element, then
puts that on the front of a new list, which is constructed by
calling `square`

on the remainder of the
empty list. The second equations ensures that
`square`

halts when it reaches the end of
the input list.

The effect of `square`

is to construct
a new list that's the same length as its input list, with
every element in the input list substituted with its square in
the output list.

Here's another such C loop, one that ensures that every letter in a string is converted to uppercase.

#include <ctype.h> char *uppercase(char *out, const char *in) { char *out = strdup(in); if (out != NULL) { for (size_t i = 0; out[i] != '\0'; i++) { out[i] = toupper(out[i]); } } return out; }#include <ctype.h> char *uppercase(char *out, const char *in) { char *out = strdup(in); if (out != NULL) { for (size_t i = 0; out[i] != '\0'; i++) { out[i] = toupper(out[i]); } } return out; }

Let's look at a Haskell equivalent.

import Data.Char (toUpper) upperCase :: String -> String upperCase (x:xs) = toUpper x : upperCase xs upperCase [] = []import Data.Char (toUpper) upperCase :: String -> String upperCase (x:xs) = toUpper x : upperCase xs upperCase [] = []

Here, we're importing the `toUpper`

function from the standard `Data.Char`

module,
which contains lots of useful functions for working with
Char data.

Our `upperCase`

function follows a
similar pattern to our earlier `square`

function. It terminates with an empty list when the input
list is empty; and when the input isn't empty, it calls
`toUpper`

on the first element, then
constructs a new list cell from that and the result of calling
itself on the rest of the input list.

These examples follow a common pattern for writing
recursive functions over lists in Haskell. The *base
case* handles the situation where our input list
is empty. The *recursive case* deals with
a non-empty list; it does something with the head of the list,
and calls itself recursively on the tail.

The `square`

and
`upperCase`

functions that we just defined
produce new lists that are the same lengths as their input
lists, and do only one piece of work per element. This is
such a common pattern that Haskell's prelude defines a
function, `map`

, to make it easier.
`map`

takes a function, and applies it to
every element of a list, returning a new list constructed from
the results of these applications.

Here are our `square`

and
`upperCase`

functions rewritten to use
`map`

.

square2 xs = map squareOne xs where squareOne x = x ** 2 upperCase2 xs = map toUpper xssquare2 xs = map squareOne xs where squareOne x = x ** 2 upperCase2 xs = map toUpper xs

This is our first time seeing a function that takes
another function as its argument. We can learn a lot about
what `map`

does by simply inspecting its
type.

ghci> :type map map :: (a -> b) -> [a] -> [b]`ghci>`

map :: (a -> b) -> [a] -> [b]`:type map`

The signature tells us that `map`

takes
two arguments. The first is a function that takes a value of
one type, `a`

, and returns a
value of another type, `b`

. This
is the only unfamiliar piece of notation in the type; notice
the parentheses that surround the signature of the function
argument so we (and Haskell) won't misread it.

Since `map`

takes a function as
argument, we refer to it as a
*higher-order* function. (In spite of the
name, there's nothing mysterious about higher-order functions;
it's just a term for functions that take other functions as
arguments, or return functions.)

Since `map`

abstracts out the pattern
common to our `square`

and
`upperCase`

functions so that we can reuse
it with less boilerplate, we can look at what those functions
have in common and figure out how to implement it
ourselves.

myMap :: (a -> b) -> [a] -> [b] myMap f (x:xs) = f x : myMap f xs myMap _ [] = []myMap :: (a -> b) -> [a] -> [b] myMap f (x:xs) = f x : myMap f xs myMap _ [] = []

We try out our `myMap`

function to give
outselves some assurance that it behaves similarly to the
standard `map`

.

ghci> map toLower "SHOUTING" <interactive>:1:4: Not in scope: `toLower' ghci> myMap toUpper "whispering" "WHISPERING" ghci> map negate [1,2,3] [-1,-2,-3]`ghci>`

<interactive>:1:4: Not in scope: `toLower'`map toLower "SHOUTING"`

`ghci>`

"WHISPERING"`myMap toUpper "whispering"`

`ghci>`

[-1,-2,-3]`map negate [1,2,3]`

This business of seeing that we're repeating an idiom, then abstracting it so we can reuse (and write less!) code, is a common aspect of Haskell programming.

Another common operation on a sequence of data is to comb through it for elements that satisfy some criterion. Here's an example in C++ of a function that walks a linked list of numbers and returns those that are odd.

#include <list> using namespace std; list<int> oddList(const list<int>& in) { list<int> out; for (list<int>::const_iterator i = in.begin(); i != in.end(); ++i) { if ((*i % 2) == 1) out.push_back(*i); } return out; }#include <list> using namespace std; list<int> oddList(const list<int>& in) { list<int> out; for (list<int>::const_iterator i = in.begin(); i != in.end(); ++i) { if ((*i % 2) == 1) out.push_back(*i); } return out; }

Our Haskell equivalent has a recursive case that's a bit more complex than our earlier functions: it only puts a number in the list it returns if the number is odd. Using a guard expresses this nicely.

oddList :: [Int] -> [Int] oddList (x:xs) | odd x = x : oddList xs | otherwise = oddList xs oddList _ = []oddList :: [Int] -> [Int] oddList (x:xs) | odd x = x : oddList xs | otherwise = oddList xs oddList _ = []

ghci> oddList [1,1,2,3,5,8,13,21,34] [1,1,3,5,13,21]`ghci>`

[1,1,3,5,13,21]`oddList [1,1,2,3,5,8,13,21,34]`

Once again, this idiom is so common that Haskell's prelude
defines a function, `filter`

, which removes
the need for boilerplate code to recurse over the list.

ghci> :type filter filter :: (a -> Bool) -> [a] -> [a] ghci> filter odd [3,1,4,1,5,9,2,6,5] [3,1,1,5,9,5]`ghci>`

filter :: (a -> Bool) -> [a] -> [a]`:type filter`

`ghci>`

[3,1,1,5,9,5]`filter odd [3,1,4,1,5,9,2,6,5]`

The `filter`

function takes a predicate
(a function that tests an argument and returns a
Bool) and applies it to every element in its
input list, returning a list of only those for which the
predicate evaluates to `True`

.

We'll be discussing `filter`

again
soon, in the section called “Folding from the right and primitive recursion”.

Another common thing to do with a loop is to “fold it up”. A simple example of this is summing the values of a list.

mySum xs = helper 0 xs where helper acc (x:xs) = helper (acc + x) xs helper acc _ = accmySum xs = helper 0 xs where helper acc (x:xs) = helper (acc + x) xs helper acc _ = acc

Our `helper`

function is tail
recursive, and uses an accumulator parameter,
`acc`

, to hold the current partial sum of the
list. As we already saw with `asInt`

, this
is a “natural” way to represent a loop in a pure
functional language.

For something a little more complicated, let's take a look at the Adler-32 checksum. Here's a Java implementation.

public class Adler32 { private static final int base = 65521; public static int compute(byte[] data, int offset, int length) { int a = 1, b = 0; for (int i = offset; i < offset + length; i++) { a = (a + (data[i] & 0xff)) % base; b = (a + b) % base; } return (b << 16) | a; } }public class Adler32 { private static final int base = 65521; public static int compute(byte[] data, int offset, int length) { int a = 1, b = 0; for (int i = offset; i < offset + length; i++) { a = (a + (data[i] & 0xff)) % base; b = (a + b) % base; } return (b << 16) | a; } }

Although Adler-32 is a simple checksum, this code isn't particularly easy to read on account of the bit-twiddling involved. Can we do any better with a Haskell implementation?

import Data.Char (ord) import Data.Bits (shiftL, (.&.), (.|.)) base = 65521 adler32 xs = helper 1 0 xs where helper a b (x:xs) = let a' = (a + (ord x .&. 0xff)) `mod` base b' = (a' + b) `mod` base in helper a' b' xs helper a b _ = (b `shiftL` 16) .|. aimport Data.Char (ord) import Data.Bits (shiftL, (.&.), (.|.)) base = 65521 adler32 xs = helper 1 0 xs where helper a b (x:xs) = let a' = (a + (ord x .&. 0xff)) `mod` base b' = (a' + b) `mod` base in helper a' b' xs helper a b _ = (b `shiftL` 16) .|. a

This isn't exactly easier to follow than the Java code,
but let's look at what's going on. Once again,
`helper`

function is tail recursive. We've
turned the two variables we updated on every loop iteration in
Java into accumulator parameters. When our recursion
terminates on the end of the input list, we compute our
checksum and return it.

If we take a step back, we can restructure our Haskell
`adler32`

to more closely resemble our
earlier `mySum`

function. Instead of two
accumulator parameters, we can use a single accumulator that's
a two-tuple.

adler32_try2 xs = helper (1,0) xs where helper (a,b) (x:xs) = let a' = (a + (ord x .&. 0xff)) `mod` base in helper (a', (a' + b) `mod` base) xs helper (a,b) _ = (b `shiftL` 16) .|. aadler32_try2 xs = helper (1,0) xs where helper (a,b) (x:xs) = let a' = (a + (ord x .&. 0xff)) `mod` base in helper (a', (a' + b) `mod` base) xs helper (a,b) _ = (b `shiftL` 16) .|. a

Why would we want to make this seemingly meaningless
structural change? Because as we've already seen with
`map`

and `filter`

, we
can extract the common behaviour shared by
`mySum`

and
`adler32_try2`

into a higher-order
function. We can describe this behaviour as “do
something to every element of a list, updating an
accumulator as we go, and returning the accumulator when
we're done”.

This kind of function is called a
*fold*, because it “folds up”
a list, and it has two variants, `foldl`

and `foldr`

.

foldl :: (a -> b -> a) -> a -> [b] -> a foldl f z xs = helper z xs where helper z [] = z helper z (x:xs) = helper (f z x) xsfoldl :: (a -> b -> a) -> a -> [b] -> a foldl f z xs = helper z xs where helper z [] = z helper z (x:xs) = helper (f z x) xs

The `foldl`

function takes a
“stepper” function, an initial value for its
accumulator, and a list. The “stepper” takes an
accumulator and an element from the list, and returns a new
accumulator value. All `foldl`

does is call
the “stepper” on the current accumulator and an
element of the list, and passes the new accumulator value to
itself recursively to consume the rest of the list.

We refer to `foldl`

as a “left
fold” because it consumes the list from left (the
head) to right.

Here's a rewrite of `mySum`

using
`foldl`

.

foldlSum xs = foldl step 0 xs where step acc x = acc + xfoldlSum xs = foldl step 0 xs where step acc x = acc + x

Notice how much simpler this code is? We're no longer
using explicit recursion, because `foldl`

takes care of that for us. We've simplified our problem down
to two things: what the initial value of the accumulator
should be (the second parameter to
`foldl`

), and how to update the accumulator
(the `step`

function). As an added bonus,
our code is now shorter, too, which makes it easier to
understand.

We can rewrite `adler32_try2`

in a
similar way, using `foldl`

to let us focus
on the details that are important.

adler32_foldl xs = let (a, b) = foldl step (1, 0) xs in (b `shiftL` 16) .|. a where step (a, b) x = let a' = a + (ord x .&. 0xff) in (a' `mod` base, (a' + b) `mod` base)adler32_foldl xs = let (a, b) = foldl step (1, 0) xs in (b `shiftL` 16) .|. a where step (a, b) x = let a' = a + (ord x .&. 0xff) in (a' `mod` base, (a' + b) `mod` base)

Here, our accumulator is a two-tuple, so the result of
`foldl`

will be, too. We pull the final
accumulator apart when `foldl`

returns, and
bit-twiddle it into a “proper” checksum.

A quick glance reveals that
`adler32_foldl`

isn't really any shorter
than `adler32_try2`

. Why should we use a
fold in this case? The advantage here lies in the fact that
folds are extremely common in Haskell, and they have regular,
predictable behaviour.

This means that a reader with a little experience will have an easier time understanding a function that uses a fold than one that uses explicit recursion. Where a fold isn't going to produce any surprises, the behaviour of a function that recurses explicitly isn't immediately obvious. Explicit recursion requires us to read closely to understand exactly what's going on.

This line of reasoning applies to other higher-order
library functions, including those we've already seen,
`map`

and `filter`

.
Because they're library functions with well-defined behaviour,
we only need to learn what they do once, and we'll have an
advantage when we need to understand any code that uses
them.

From looking at `adler32_foldl`

, we
know that we can accumulate more than one value at a time when
we fold over a list. Here's another use for a fold:
optimising code by avoiding multiple traversals of a
list.

Let's consider the problem of finding the root mean square of a list of numbers: compute the sum of the squares of every element in the list, divide by its length, then coompute the square root of that number. In an imperative language like C, we wouldn't even think twice about writing code like this.

struct double_list { struct double_list *next; double val; }; double rootMeanSquare(const struct double_list *list) { double mean_square = 0; size_t length = 0; while (list != NULL) { mean_square += list->val * list->val; length++; list = list->next; } return sqrt(mean_square / length); }struct double_list { struct double_list *next; double val; }; double rootMeanSquare(const struct double_list *list) { double mean_square = 0; size_t length = 0; while (list != NULL) { mean_square += list->val * list->val; length++; list = list->next; } return sqrt(mean_square / length); }

Clearly, we're looping over the list just once, updating
the accumulator values `mean_square`

and
`length`

as we go.

Meanwhile, over in functional programming land, the temptation is strong to turn our verbal description of the root mean square into code.

rootMeanSquare :: [Double] -> Double rootMeanSquare xs = sqrt (sum (map square xs) / fromIntegral (length xs)) where square x = x ** 2rootMeanSquare :: [Double] -> Double rootMeanSquare xs = sqrt (sum (map square xs) / fromIntegral (length xs)) where square x = x ** 2

This is a lovely, compact translation of the verbal
description. It even uses our new friend, the
`map`

function, to make the code clearer by
avoiding explicit recursion, but it's not necessarily good
code. The calls to `map`

and
`length`

are each going to traverse the
input list once.

On a small list, the cost of traversing it twice obviously won't matter, but on a big list, we're likely to notice. We can use a fold to avoid this need to traverse the list twice.

rootMeanSquare_foldl xs = let (length, meanSquare) = foldl step (0,0) xs in sqrt (meanSquare / fromIntegral length) where step (length, meanSquare) x = (length + 1, meanSquare + x**2)rootMeanSquare_foldl xs = let (length, meanSquare) = foldl step (0,0) xs in sqrt (meanSquare / fromIntegral length) where step (length, meanSquare) x = (length + 1, meanSquare + x**2)

Clearly, this code isn't as readable as the earlier
version that used `map`

and
`length`

. Which version should we prefer?
It's often best to start out by writing the most readable
code, since we can make that correct most quickly, and put off
worrying about transforming it into something faster until
much later, when we have profiling data for our program. Only
if those numbers indicate a performance problem should we
worry about stepping back in and transforming our code. We'll
have much more to say about profiling, performance, and
optimisation later, in chapter XXX.

The counterpart to `foldl`

is
`foldr`

, which folds from the right of a
list.

foldr :: (a -> b -> b) -> b -> [a] -> b foldr f z xs = helper xs where helper [] = z helper (y:ys) = f y (helper ys)foldr :: (a -> b -> b) -> b -> [a] -> b foldr f z xs = helper xs where helper [] = z helper (y:ys) = f y (helper ys)

At first glance, `foldr`

might seem
less useful than `foldl`

: what use is a
function that folds from the right? But consider the
Prelude's `filter`

function, which we last
encountered in the section called “Selecting pieces of input”. If we write
`filter`

using explicit recursion, it will
look something like this.

filter :: (a -> Bool) -> [a] -> [a] filter p [] = [] filter p (x:xs) | p x = x : filter p xs | otherwise = filter p xsfilter :: (a -> Bool) -> [a] -> [a] filter p [] = [] filter p (x:xs) | p x = x : filter p xs | otherwise = filter p xs

Perhaps surpsisingly, though, we can write
`filter`

as a fold, using
`foldr`

.

myFilter p xs = foldr step [] xs where step x ys | p x = x : ys | otherwise = ysmyFilter p xs = foldr step [] xs where step x ys | p x = x : ys | otherwise = ys

This is the sort of definition that could cause us a
headache, so let's examine it a little depth. Like
`foldl`

, `foldr`

takes a
function and a base case (what to do when the input list is
empty) as arguments. From reading the type of
`filter`

, we know that our
`myFilter`

function must return a list of
the same type as it consumes, so the base case should be a
list of this type, and the `step`

helper
function must return a list.

Since we know that `foldr `

calls
`step`

on one element of the input list at
a time, with the accumulator as its second argument, what
`step`

does must be quite simple. If the
predicate returns `True`

, it pushes that
element onto the accumulated list; otherwise, it leaves the
list untouched.

The class of functions that we can express using
`foldr`

is called *primitive
recursive*. A surprisingly large number of list
manipulation functions are primitive recursive. For example,
here's `map`

written in terms of
`foldr`

.

myMap :: (a -> b) -> [a] -> [b] myMap f xs = foldr step [] xs where step x [] = [f x] step x ys = f x : ysmyMap :: (a -> b) -> [a] -> [b] myMap f xs = foldr step [] xs where step x [] = [f x] step x ys = f x : ys

In fact, we can even write `foldl`

using `foldr`

!

myFoldl :: (a -> b -> a) -> a -> [b] -> a myFoldl f z xs = foldr step id xs z where step x g a = g (f a x)myFoldl :: (a -> b -> a) -> a -> [b] -> a myFoldl f z xs = foldr step id xs z where step x g a = g (f a x)

While we can write `foldl`

in terms of
`foldr`

, we can't do the converse:
`foldr`

is “more basic than”
`foldl`

. This should make it clearer why
we call functions written with `foldr`

primitive recursive.

(By the way, don't feel like you have to go to special
lengths to remember the term “primitive
recursive”. It's just useful to remember that you
read about it somewhere, and that it has something to do with
`foldr`

.)

Another useful way to think about the way
`foldr`

works is that it
*transforms* its input list. Its first two
arguments are “what to do with each head/tail element of
the list”, and “what to substitute at the end
of the list”.

The “identity” transformation with
`foldr`

thus replaces the empty list with
itself, and applies the list constructor to each head/tail
pair:

identity :: [a] -> [a] identity xs = foldr (:) [] xsidentity :: [a] -> [a] identity xs = foldr (:) [] xs

It transforms a list into a copy of itself.

ghci> identity [1,2,3] <interactive>:1:0: Not in scope: `identity'`ghci>`

<interactive>:1:0: Not in scope: `identity'`identity [1,2,3]`

If `foldr`

replaces the end of a list
with some other value, this gives us an easy way to think
about Haskell's list append function,
`(++)`

.

ghci> [1,2,3] ++ [4,5,6] [1,2,3,4,5,6]`ghci>`

[1,2,3,4,5,6]`[1,2,3] ++ [4,5,6]`

All we have to do to append a list onto another is substitute that second list for the end of our first list.

append :: [a] -> [a] -> [a] append xs ys = foldr (:) ys xsappend :: [a] -> [a] -> [a] append xs ys = foldr (:) ys xs

ghci> append [1,2,3] [4,5,6] <interactive>:1:0: Not in scope: `append'`ghci>`

<interactive>:1:0: Not in scope: `append'`append [1,2,3] [4,5,6]`

Now that we can think in terms of transforming a list, it
becomes easier for us to reason about functions like summing a
list. We replace the empty list with zero, which becomes our
first accumulator value. We then apply the
`(+)`

function to each element of the list
and an accumulator value, to give a new accumulator
value.

ghci> foldr (+) 0 [1,2,3] 6`ghci>`

6`foldr (+) 0 [1,2,3]`

Knowing that `foldr`

transforms a list
from right to left, we can “unroll” the
application of it by hand to see the intermediate accumulators
that it produces.

ghci> foldr (+) 0 [] 0 ghci> foldr (+) 0 [3] 3 ghci> foldr (+) 0 [2,3] 5 ghci> foldr (+) 0 [1,2,3] 6`ghci>`

0`foldr (+) 0 []`

`ghci>`

3`foldr (+) 0 [3]`

`ghci>`

5`foldr (+) 0 [2,3]`

`ghci>`

6`foldr (+) 0 [1,2,3]`

As our extended treatment of folds should indicate, the
`foldr`

function is nearly as important a
member of our list-programming toolbox as the more basic list
functions we saw in the section called “Working with lists”.

To keep our initial discussion simple, we used
`foldl`

throughout most of this section.
However, any time you want to fold from the left in practice,
use `foldl'`

from the
`Data.List`

module instead, because it's more
efficient. You should take this on faith for now; we'll
explain why you should avoid plain `foldl`

in normal use in section XXX.

1. | Use a fold (choosing the appropriate fold will make
your code much simpler) to rewrite and improve upon the
asInt_fold :: String -> IntasInt_fold :: String -> Int Your function should behave as follows. ghci> asInt_fold "101" 101 ghci> asInt_fold "-31337" -73313 Extend your function to handle the following kinds
of exceptional conditions by calling
ghci> asInt_fold "" *** Exception: empty string ghci> asInt_fold "-" *** Exception: empty string ghci> asInt_fold "-3" -3 ghci> asInt_fold "2.7" *** Exception: non-digit '.' ghci> asInt_fold "314159265358979323846" *** Exception: numeric overflow |

2. | The type ErrorMessage = String asInt_either :: String -> Either ErrorMessage Inttype ErrorMessage = String asInt_either :: String -> Either ErrorMessage Int ghci> asInt_either "33" Right 33 ghci> asInt_either "foo" Left "non-digit 'o'" |

3. | The Prelude function concat :: [[a]] -> [a]concat :: [[a]] -> [a] |

4. | The Prelude function takeWhile :: (a -> Bool) -> [a] -> [a]takeWhile :: (a -> Bool) -> [a] -> [a] Use |

5. | The groupBy :: (a -> a -> Bool) -> [a] -> [[a]]groupBy :: (a -> a -> Bool) -> [a] -> [[a]] Use |

6. | How many of the following standard prelude functions can you rewrite using list folds? |

The article [Hutton99] is an excellent and deep tutorial covering folds. It includes many examples of how to use simple, systematic calculation techniques to turn functions that use explicit recursion into folds.

*FIXME: Example: run-length encoding. Use to show how
looping can be done via tail recursion.
*

In many of the function definitions we've seen so far, we've written short helper functions.

isInAny needle haystack = any inSequence haystack where inSequence s = needle `isInfixOf` sisInAny needle haystack = any inSequence haystack where inSequence s = needle `isInfixOf` s

Haskell lets us write completely anonymous functions, which
we can use to avoid the need to give names to our helper
functions. Anonymous functions are often called
“lambda” functions, in a nod to their heritage in
the lambda calculus. We introduce an anonymous function with a
backslash character, `\`

. This is followed by the
function's arguments (which can include patterns), then an arrow
`->`

to introduce the function's body.

Lambdas are most easily illustrated by example. Here's a
rewrite of `isInAny`

using an anonymous
function.

isInAny2 needle haystack = any (\s -> needle `isInfixOf` s) haystackisInAny2 needle haystack = any (\s -> needle `isInfixOf` s) haystack

We've wrapped the lambda in parentheses here so that Haskell can tell where the function body ends.

Anonymous functions behave in every respect identically to functions that have names, but Haskell places a few important on how we can define them. Most importantly, whereas we can write a normal function using multiple clauses containing different patterns and guards, a lambda can only have a single clause in its definition.

The limitation to a single clause restricts how we can use patterns in the definition of a lambda. We'll usually write a normal function with several clauses to cover different pattern matching possibilities.

safeHead (x:_) = Just x safeHead _ = NothingsafeHead (x:_) = Just x safeHead _ = Nothing

But as we can't write multiple clauses to define a lambda, we have to be sure that any patterns we use will match.

unsafeHead = \(x:_) -> xunsafeHead = \(x:_) -> x

This definition of `unsafeHead`

will
explode in our faces if we call it with a value on which pattern
matching fails.

ghci> :type unsafeHead unsafeHead :: [t] -> t ghci> unsafeHead [1] 1 ghci> unsafeHead [] *** Exception: Lambda.hs:7:13-23: Non-exhaustive patterns in lambda`ghci>`

unsafeHead :: [t] -> t`:type unsafeHead`

`ghci>`

1`unsafeHead [1]`

`ghci>`

*** Exception: Lambda.hs:7:13-23: Non-exhaustive patterns in lambda`unsafeHead []`

The definition typechecks, so it will compile, so the error will occur at runtime. The moral of this story is to be careful in how you use patterns when defining an anonymous function: make sure your patterns can't fail!

Another thing to notice about the
`isInAny`

and `isInAny2`

functions we showed above is that the first version, using a
helper function that has a name, is a little easier to read than
the version that plops an anonymous function into the middle.
The named helper function doesn't disrupt the
“flow” of the function in which it's used, and the
judiciously chosen name gives us a little bit of information
about what the function is expected to do.

In contrast, when we run across a lambda in the middle of a function body, we have to switch gears and read its definition fairly carefully to understand what it does. To help with readability and maintainability, then, we tend to avoid lambdas in many situations where we could use them to trim a few characters from a function definition. Very often, we'll use a partially applied function instead, resulting in clearer and more readable code than either a lambda or an explicit function. Don't know what a partially applied function is yet? Read on!

In Haskell, a single piece of syntax doesn't often get
pressed into use for multiple tasks. So why does the
`->`

arrow get used for what looks like two
purposes in the type signature of a function?

ghci> :type dropWhile dropWhile :: (a -> Bool) -> [a] -> [a]`ghci>`

dropWhile :: (a -> Bool) -> [a] -> [a]`:type dropWhile`

It looks like the `->`

is separating the
arguments to `dropWhile`

from each other, but
also that it separates the arguments from the return type. But
in fact `->`

has only one meaning: it
denotes a function that takes an argument of the type on the
left, and returns a value of the type on the right.

The implication here is that in Haskell, all functions take
only one argument. While `dropWhile`

*looks* like a function that takes two
arguments, it only takes one. Here's a perfectly valid Haskell
expression.

ghci> :module +Data.Char ghci> dropWhile isSpace <interactive>:1:0: No instance for (Show ([Char] -> [Char])) arising from use of `print' at <interactive>:1:0-16 Possible fix: add an instance declaration for (Show ([Char] -> [Char])) In the expression: print it In a 'do' expression: print it`ghci>`

`:module +Data.Char`

`ghci>`

<interactive>:1:0: No instance for (Show ([Char] -> [Char])) arising from use of `print' at <interactive>:1:0-16 Possible fix: add an instance declaration for (Show ([Char] -> [Char])) In the expression: print it In a 'do' expression: print it`dropWhile isSpace`

What type does it have, and what does it do?

ghci> :type dropWhile isSpace dropWhile isSpace :: [Char] -> [Char] ghci> dropWhile isSpace " \n\nfoo" "foo"`ghci>`

dropWhile isSpace :: [Char] -> [Char]`:type dropWhile isSpace`

`ghci>`

"foo"`dropWhile isSpace " \n\nfoo"`

Well, *that* looks useful. The value
`dropWhile isSpace`

is a function that strips leading
white space from a string.

Every time we give an argument to a function, we can
“chop” an element off the front of its type
signature. Let's take `zip3`

as an example
to see what we mean; this is a function that zips three lists
into a list of three-tuples.

ghci> :type zip3 zip3 :: [a] -> [b] -> [c] -> [(a, b, c)] ghci> zip3 "foo" "bar" "quux" [('f','b','q'),('o','a','u'),('o','r','u')]`ghci>`

zip3 :: [a] -> [b] -> [c] -> [(a, b, c)]`:type zip3`

`ghci>`

[('f','b','q'),('o','a','u'),('o','r','u')]`zip3 "foo" "bar" "quux"`

If we call `zip3`

with just one argument,
we get a function that accepts two arguments, where the first
argument is now *fixed*. No matter what
second or third arguments we pass, the first argument to this
new function will always be the fixed value we already
specified.

ghci> :type zip3 "foo" zip3 "foo" :: [b] -> [c] -> [(Char, b, c)] ghci> let zip3foo xs = zip3 "foo" xs ghci> :type zip3foo zip3foo :: [b] -> [c] -> [(Char, b, c)] ghci> (zip3 "foo") "aaa" "bbb" [('f','a','b'),('o','a','b'),('o','a','b')] ghci> zip3foo "aaa" "bbb" [('f','a','b'),('o','a','b'),('o','a','b')] ghci> zip3foo [1,2,3] [True,False,True] [('f',1,True),('o',2,False),('o',3,True)]`ghci>`

zip3 "foo" :: [b] -> [c] -> [(Char, b, c)]`:type zip3 "foo"`

`ghci>`

`let zip3foo xs = zip3 "foo" xs`

`ghci>`

zip3foo :: [b] -> [c] -> [(Char, b, c)]`:type zip3foo`

`ghci>`

[('f','a','b'),('o','a','b'),('o','a','b')]`(zip3 "foo") "aaa" "bbb"`

`ghci>`

[('f','a','b'),('o','a','b'),('o','a','b')]`zip3foo "aaa" "bbb"`

`ghci>`

[('f',1,True),('o',2,False),('o',3,True)]`zip3foo [1,2,3] [True,False,True]`

When we pass fewer arguments to a function than the function
can accept, we call this *partial
application* of the function: we're applying the
function to only some of its arguments.

In the example above, we have a partially applied function,
`zip3 "foo"`

, and a new function,
`zip3foo`

. We can see that the type
signatures of the two and their behaviour are identical.

This applies just as well if we fix two arguments, giving us a function of just one argument.

ghci> let zip3foobar = zip3 "foo" "bar" ghci> :type zip3foobar zip3foobar :: [c] -> [(Char, Char, c)] ghci> zip3foobar "quux" [('f','b','q'),('o','a','u'),('o','r','u')] ghci> zip3foobar [1,2] [('f','b',1),('o','a',2)]`ghci>`

`let zip3foobar = zip3 "foo" "bar"`

`ghci>`

zip3foobar :: [c] -> [(Char, Char, c)]`:type zip3foobar`

`ghci>`

[('f','b','q'),('o','a','u'),('o','r','u')]`zip3foobar "quux"`

`ghci>`

[('f','b',1),('o','a',2)]`zip3foobar [1,2]`

Partial function application lets us avoid writing tiresome
throwaway functions. It's generally a lot better for this
purpose than the anonymous functions we introduced in the section called “Anonymous (lambda) functions”. Looking back at the
`isInAny`

function we defined there, here's
how we'd write it to use a partially applied function instead of
a named helper function or a lambda.

isInAny3 needle haystack = any (isInfixOf needle) haystackisInAny3 needle haystack = any (isInfixOf needle) haystack

Here, the expression `isInfixOf needle`

is the
partially applied function. We're taking the function
`isInfixOf`

, and “fixing” its
first argument to be the `needle`

variable from
our parameter list. This gives us a partially applied function
that has exactly the same type and behaviour as the helper and
lambda in our earlier definitions.

Haskell provides a handy notational shortcut to let us
write partially applied functions using infix operators. If
we enclose an operator in parentheses, we can supply its left
or right argument inside the parentheses to get a partially
applied function. This kind of partial application is called
a *section*.

ghci> (1+) 2 3 ghci> map (*3) [24,36] [72,108] ghci> map (2^) [3,5,7,9] [8,32,128,512]`ghci>`

3`(1+) 2`

`ghci>`

[72,108]`map (*3) [24,36]`

`ghci>`

[8,32,128,512]`map (2^) [3,5,7,9]`

If we provide the left argument inside the section, then calling the resulting function with one argument supplies the operator's right argument. And vice versa.

Recall that we can wrap a function name in backquotes to use it as an infix operator. This lets us use sections with functions.

ghci> :type (`elem` ['a'..'z']) (`elem` ['a'..'z']) :: Char -> Bool`ghci>`

(`elem` ['a'..'z']) :: Char -> Bool`:type (`elem` ['a'..'z'])`

The above definition fixes `elem`

's
second argument, giving us a function that checks to see
whether its argument is a lowercase letter.

ghci> (`elem` ['a'..'z']) 'f' True`ghci>`

True`(`elem` ['a'..'z']) 'f'`

Using this as an argument to `any`

, we
get a function that checks an entire string to see if it's all
lowercase.

ghci> all (`elem` ['a'..'z']) "Frobozz" False`ghci>`

False`all (`elem` ['a'..'z']) "Frobozz"`

Haskell's `tails`

function, in the
`Data.List`

module, generalises the
`tail`

function we introduced earlier. It
successively applies `tail`

to its input,
then calls itself on the result, until there's nothing
left.

ghci> :m +Data.List ghci> tail "foobar" "oobar" ghci> tail (tail "foobar") "obar" ghci> tails "foobar" ["foobar","oobar","obar","bar","ar","r",""]`ghci>`

`:m +Data.List`

`ghci>`

"oobar"`tail "foobar"`

`ghci>`

"obar"`tail (tail "foobar")`

`ghci>`

["foobar","oobar","obar","bar","ar","r",""]`tails "foobar"`

Each of these strings is a *suffix* of
the initial string, so `tails`

produces a
list of all suffixes, plus an extra empty list at the
end. In fact, it always produces that extra empty list, even
when its input list is empty.

ghci> tails [] [[]]`ghci>`

[[]]`tails []`

What if we want a function that behaves like
`tails`

, but which *only*
returns suffixes? One possibility would be for us to write our
own version by hand.

suffixes :: [a] -> [[a]] suffixes xs@(_:xs') = xs : suffixes xs' suffixes _ = []suffixes :: [a] -> [[a]] suffixes xs@(_:xs') = xs : suffixes xs' suffixes _ = []

Let's try out that definition.

ghci> tails "foo" ["foo","oo","o",""] ghci> suffixes "foo" <interactive>:1:0: Not in scope: `suffixes'`ghci>`

["foo","oo","o",""]`tails "foo"`

`ghci>`

<interactive>:1:0: Not in scope: `suffixes'`suffixes "foo"`

You may have noticed the funny-looking pattern
`xs@(_:xs')`

in our definition of
`suffixes`

. This is called an
*as-pattern*, and it means “bind the
variable `xs`

to the expression in the
matched pattern `(_:xs')`

”.

In this case, if the pattern after the “@”
matches, `xs`

will be bound to the entire
list that matched, and `xs'`

to the rest of
the list (we used the wildcard `_`

pattern to
indicate that we're not interested in the value of the head of
the list).

Let's look at a second definition of the
`suffixes`

function, only this time without
using an as-pattern.

noisier :: [a] -> [[a]] noisier (x:xs) = (x:xs) : noisier xs noisier _ = []noisier :: [a] -> [[a]] noisier (x:xs) = (x:xs) : noisier xs noisier _ = []

Here, the list that we've deconstructed in the pattern match just gets put right back together in the body of the function. The as-pattern makes the code more readable by letting us avoid the need to repeat ourselves.

It seems a shame to introduce a new function,
`suffixes`

, that does almost the same thing
as the existing `tails`

function. Surely
we can do better?

Remember the `init`

function we
introduced in the section called “Working with lists”?

suffixes2 xs = init (tails xs)suffixes2 xs = init (tails xs)

The `suffixes2`

function behaves
identically to `suffixes`

, but it's a
single line of code.

ghci> suffixes2 "foo" <interactive>:1:0: Not in scope: `suffixes2'`ghci>`

<interactive>:1:0: Not in scope: `suffixes2'`suffixes2 "foo"`

If we take a step back, we see the glimmer of a pattern here: we're calling a function, then applying another function to its result. Let's turn that pattern into a function definition.

compose :: (b -> c) -> (a -> b) -> a -> c compose f g x = f (g x)compose :: (b -> c) -> (a -> b) -> a -> c compose f g x = f (g x)

We now have a function, `compose`

, that
we can use to “glue” two other functions
together.

suffixes3 xs = compose init tails xssuffixes3 xs = compose init tails xs

As Haskell's automatic currying lets us drop the
`xs`

variable, we can make our definition
even shorter.

suffixes4 = compose init tailssuffixes4 = compose init tails

Fortunately, we don't need to write our own
`compose`

function. Plugging functions
into each other like this is so common that Haskell's prelude
provides the `(.)`

operator to denote
function composition.

suffixes5 = init . tailssuffixes5 = init . tails

The `(.)`

operator isn't a special
piece of language syntax; it's just a normal operator.

ghci> :type (.) (.) :: (b -> c) -> (a -> b) -> a -> c ghci> :type suffixes <interactive>:1:0: Not in scope: `suffixes' ghci> :type suffixes5 <interactive>:1:0: Not in scope: `suffixes5' ghci> suffixes5 "foo" <interactive>:1:0: Not in scope: `suffixes5'`ghci>`

(.) :: (b -> c) -> (a -> b) -> a -> c`:type (.)`

`ghci>`

<interactive>:1:0: Not in scope: `suffixes'`:type suffixes`

`ghci>`

<interactive>:1:0: Not in scope: `suffixes5'`:type suffixes5`

`ghci>`

<interactive>:1:0: Not in scope: `suffixes5'`suffixes5 "foo"`

We can create new functions at any time by writing chains
of composed functions, stitched together with
`(.)`

, so long (of course) as the result
type of the function on the right of each
`(.)`

matches the type of parameter that
the function on the left can accept.

ghci> (sum . map sum . filter (any odd)) [[1,4,9],[2,4,6]] 14`ghci>`

14`(sum . map sum . filter (any odd)) [[1,4,9],[2,4,6]]`

Here's an example drawn from a piece of code I wrote the
day before I started on this section. I wanted to get a list
of C preprocessor definitions from a header file shipped with
`libpcap`

, a popular network packet filtering
library.

import Data.List (isPrefixOf) dlts :: String -> [String] dlts = foldr step [] . linesimport Data.List (isPrefixOf) dlts :: String -> [String] dlts = foldr step [] . lines

We take an entire file, split it up with
`lines`

, then call ```
foldr step
[]
```

on the result. Since I know, based on the type of
`lines`

, that I'm folding over a list of
strings, the `step`

helper function must
thus operate on individual lines.

where step l ds | "#define DLT_" `isPrefixOf` l = (head . drop 1 . words) l : ds | otherwise = ds where step l ds | "#define DLT_" `isPrefixOf` l = (head . drop 1 . words) l : ds | otherwise = ds

If we match a macro definition, we cons the name of the macro onto the head of the list we're returning; otherwise, we do nothing with the list on this invocation.

We can see heavy use of function composition in the body
of `step`

. While all of these functions
are by now familiar to us, it can take a little practice to
glue together the sequence of types in a chain of compositions
like this. Let's walk through the procedure by hand.

ghci> :type words words :: String -> [String] ghci> words "#define DLT_CHAOS 5" ["#define","DLT_CHAOS","5"]`ghci>`

words :: String -> [String]`:type words`

`ghci>`

["#define","DLT_CHAOS","5"]`words "#define DLT_CHAOS 5"`

We then call the partially applied function ```
drop
1
```

on the result of `words`

.

ghci> :type drop 1 drop 1 :: [a] -> [a]`ghci>`

drop 1 :: [a] -> [a]`:type drop 1`

See how naturally partial application fits in here? It's
given us a function that turns a list into another list.
Composing these, we match up the result of
`words`

with the parameter of ```
drop
1
```

.

ghci> :type drop 1 . words drop 1 . words :: String -> [String] ghci> (drop 1 . words) "#define DLT_CHAOS 5" ["DLT_CHAOS","5"]`ghci>`

drop 1 . words :: String -> [String]`:type drop 1 . words`

`ghci>`

["DLT_CHAOS","5"]`(drop 1 . words) "#define DLT_CHAOS 5"`

Finally, calling `head`

on the result
of `drop 1 . words`

will give us what we want: the
name of the macro we're defining.

ghci> :type head . drop 1 . words head . drop 1 . words :: String -> String ghci> (head . drop 1 . words) "#define DLT_CHAOS 5" "DLT_CHAOS"`ghci>`

head . drop 1 . words :: String -> String`:type head . drop 1 . words`

`ghci>`

"DLT_CHAOS"`(head . drop 1 . words) "#define DLT_CHAOS 5"`

Use your head wisely | |
---|---|

After warning against unsafe list functions in the section called “Safely and sanely working with unsafe functions”, here we are calling
In this case, we can reassure ourselves that we're safe
from a runtime failure in the call to
This is an example of the kind of reasoning we ought to do to convince ourselves that our code won't explode when we call partial functions. Don't forget our earlier admonition: calling unsafe functions like this requires care, and can often make our code more fragile in subtle ways. |