[ch05 done Bryan O'Sullivan **20080731031646] { addfile ./examples/ch05/InteractWith.hs hunk ./examples/ch05/splitlines.py 1 -## snippet splitlines -def splitlines(s): - ret = [] - while True: - head, sep, tail = s.partition('\r\n') - if not (sep or tail): - head, sep, tail = s.partition('\r') - if not (sep or tail): - head, sep, tail = s.partition('\n') - if not (sep or tail): - break - ret.append(head) - s = tail - if s: - ret.append(s) - return ret -## /snippet splitlines rmfile ./examples/ch05/splitlines.py hunk ./en/ch04-defining-types.xml 1564 - - Create a function that computes the root mean square - of a list. One way to do so is by squaring every element - of the list, calculating the mean of the new list, and - obtaining the square root (using - sqrt) of the mean. - - - - hunk ./en/ch05-fp.xml 9 - There are two aspects to the early stages of programming - effectively in Haskell. The first is coming to terms with the - shift in mindset from imperative programming to functional: we - have to step away from an entire toolbox of programming habits - from other languages. We do this not because these techniques - are bad, but because in a functional language they're either not - applicable or they turn out downright ungainly. At the same - time, we have to come up to speed with a replacement vocabulary, - so that we can hack productively. + Our early learning of Haskell has two distinct + aspects. The first is coming to terms with the shift in mindset + from imperative programming to functional: we have to replace + our programming habits from other languages. We do this not + because imperative techniques are bad, but because in a + functional language other techniques work better. hunk ./en/ch05-fp.xml 16 - Our second challenge is learning our way around the standard - Haskell libraries. As in any language, the libraries act as a - lever, enabling us to multiply our problem solving power. - However, Haskell libraries tend to be organised around a higher + Our second challenge is learning our way around + the standard Haskell libraries. As in any language, the + libraries act as a lever, enabling us to multiply our problem + solving power. Haskell libraries tend to operate at a higher hunk ./en/ch05-fp.xml 21 - probably have to put more effort into learning them, but in - exchange they offer a tantalisingly greater magnification of our - efforts. + need to work a little harder to learn to use the libraries, but + in exchange they offer a lot of power. hunk ./en/ch05-fp.xml 33 + + A simple command line framework + + In most of this chapter, we will concern ourselves with code + that has no interaction with the outside world. To maintain our + focus on practical code, we will begin by developing a gateway + between our pure code and the outside world. Our + framework simply reads the contents of one file, applies a + function to the file, and writes the result to another + file. + + &InteractWith.hs:main; + + This is all we need to write simple, but complete, file + processing programs. This is a complete program. We can + compile it to an executable named + InteractWith as follows. + + $ ghc --make InteractWith +[1 of 1] Compiling Main ( InteractWith.hs, InteractWith.o ) +Linking InteractWith ... + + If we run this program from the shell or command prompt, it + will accept two file names: the name of a file to read, and the + name of a file to write. + + $ ./Interact +error: exactly two arguments needed +$ ./Interact hello-in.txt hello-out.txt +$ cat hello-in.txt +hello world +$ cat hello-out.txt +hello world + + Some of the notation in our source file is new. The &do; + keyword introduces a block of actions that + can cause effects in the real world, such as reading or writing + a file. The <- operator is the equivalent of + assignment inside a &do; block. This is enough explanation to + get us started. We will talk in much more detail about these + details of notation, and I/O in general, in . + + When we want to test a function that cannot talk to the + outside world, we simply replace the name + id in the code above with the name of the + function we want to test. Whatever our function does, it will + need to have the type String -> String: in other + words, it must accept a string, and return a string. + + hunk ./en/ch05-fp.xml 94 - While lines looks useful, it relies on - us reading a file in text mode in order to work - (yes, we'll be talking about opening files soon); it doesn't - deal well with Windows line ending conventions. + While lines looks useful, it + relies on us reading a file in text mode in order + to work. Text mode is a feature common to many programming + languages: it provides a special behavior when we read and + write files on Windows. When we read a file in text mode, the + file I/O library translates the line ending sequence + "\r\n" (carriage return followed by newline) to + "\n" (newline alone), and it does the reverse when + we write a file. On Unix-like systems, text mode does not + perform any translation. As a result of this difference, if we + read a file on one platform that was written on the other, the + line endings are likely to become a mess. (Both + readFile and writeFile + operate in text mode.) hunk ./en/ch05-fp.xml 111 - The function only splits on newline characters, leaving - carriage returns dangling at the ends of lines. Ugh. We can't - rely on opening a file in text mode to do the right thing on our - behalf. For example, if we're reading a Windows-generated text + The lines function only + splits on newline characters, leaving carriage returns dangling + at the ends of lines. If we read a Windows-generated text hunk ./en/ch05-fp.xml 117 - Years of comfortable hacking with Python's universal - newline support, which transparently handles Unix and - Windows line ending conventions for us, left us wanting - something similar in Haskell. Although Python conveniently - provides a built-in splitlines string - method, we'll rewrite it as a Python function, just to see what - a reasonable Python implementation might look like. + We have comfortably used Python's universal + newline support for years: this transparently handles Unix and + Windows line ending conventions for us. We would like to + provide something similar in Haskell. hunk ./en/ch05-fp.xml 122 - &splitlines.py:splitlines; + Since we are still early in our career of reading + Haskell code, we will discuss our Haskell implementation in + quite some detail. hunk ./en/ch05-fp.xml 126 - The key to understanding this code lies with Python's - partition string method: it searches for a - substring. If it finds it, it returns a triple of the substring - before the match, the match, and the substring after the match. - Otherwise, it returns a triple of the entire string and two - empty strings. + &SplitLines.hs:splitLines.type; hunk ./en/ch05-fp.xml 128 - Because we're still early in our career of reading Haskell - code, we'll discuss our Haskell implementation in quite some - detail. This is almost the last instance where we'll walk - through basic Haskell code at this level. + Our function's type signature indicates that it accepts a + single string, the contents of a file with some unknown + line ending convention. It returns a list of strings, + representing each line from the file. hunk ./en/ch05-fp.xml 133 - Haskell provides a similar function named - break that we can use to the same end. - Unlike Python's partition, it operates over - lists of any type, instead of strings. It does this by taking a - function as its first parameter: this function takes an element - of the list, and returns a Bool to indicate whether - to break the list there or not. What break - returns is a two-tuple, of the sublist before the predicate - returns True (the prefix), and the rest of the list - (the suffix). + &SplitLines.hs:splitLines; hunk ./en/ch05-fp.xml 135 - &splitlines.ghci:break; + Before we dive into detail, notice first how we have + organized our code. We have presented the important pieces of + code first, keeping the definition of + isLineTerminator until later. Because we + have given the helper function a readable name, we can guess + what it does even before we've read it, which eases the smooth + flow of reading the code. hunk ./en/ch05-fp.xml 143 - This makes break both more and less - powerful than partition: we can use it on - any type of list, not just a string; but it can only examine a - single item at a time, not a sublist. + The Prelude defines a function named + break that we can use to partition a list + into two parts. It takes a function as its first parameter. That + function must examine an element of the list, and return a + Bool to indicate whether to break the list at that + point. The break function returns a pair, + which consists of the sublist consumed before the predicate + returned True (the prefix), + and the rest of the list (the + suffix). hunk ./en/ch05-fp.xml 154 - For our purposes, examining one character at a time will work - perfectly well, since we only need to match a single carriage - return or newline. + &splitlines.ghci:break; hunk ./en/ch05-fp.xml 156 - &SplitLines.hs:splitLines; + Since we only need to match a single carriage + return or newline at a time, examining one element of the list + at a time is good enough for our needs. + + The first equation of splitLines + indicates that if we match an empty string, we have no further + work to do. hunk ./en/ch05-fp.xml 164 - Our function first calls break on its - input string, to break it into the substring before a line - terminator, and the rest of the string (which will include the - line terminator, if any). + In the second equation, we first apply + break to our input string. The prefix is + the substring before a line terminator, and the suffix is the + remainder of the string. The suffix will include the line + terminator, if any is present. hunk ./en/ch05-fp.xml 170 - It then uses the pattern matching capabilities of - case to inspect the return value of - break. We know that - break always returns a two-tuple, so each - pattern on the left matches a two-tuple. The first element of - the tuple pattern doesn't inspect its value; it just binds the - variable pre to whatever value is in that - element of the tuple. The second element of the pattern - does inspect its value. The first pattern - matches any string containing at least two characters that - begins with a carriage return, followed by a newline, then binds - suf to the remainder of the string. The - other patterns should be self-explanatory. + The pre : expression tells us + that we should add the pre value to the front + of the list of lines. We then use a &case; expression to + inspect the suffix, so we can decide what to do next. The + result of the &case; expression will be used as the second + argument to the (:) list constructor. hunk ./en/ch05-fp.xml 177 - The right-hand side of each of the first three branches of - the case expression constructs a new list, using - the prefix string pre as the head, and as the - remainder, the list of strings resulting from applying - splitLines to the suffix string - suf without the leading line separator - characters. + The first pattern matches a string that begins + with a carriage return, followed by a newline. The variable + rest is bound to the remainder of the string. + The other patterns are similar, so they ought to be + easy to follow. hunk ./en/ch05-fp.xml 183 - Following a prose description of the behaviour of a Haskell - function isn't easy. We can get a better understanding by - stepping into &ghci;, and watching the function at work in - different circumstances. + A prose description of a Haskell function isn't + necessarily easy to follow. We can gain a better understanding + by stepping into &ghci;, and oberving the behavior of the + function in different circumstances. hunk ./en/ch05-fp.xml 188 - We'll start by looking at a string that doesn't contain any - line separators. + Let's start by partitioning a string that doesn't + contain any line terminators. hunk ./en/ch05-fp.xml 193 - Here, our call to break never finds a - line separator, so we get an empty suffix. + Here, our application of + break never finds a line terminator, so the + suffix it returns is empty. hunk ./en/ch05-fp.xml 199 - The case expression thus hits a match on the - fourth branch, and we're done. What about a slightly more + The &case; expression in + splitLines must thus be matching on the + fourth branch, and we're finished. What about a slightly more hunk ./en/ch05-fp.xml 206 - Our first call to break gives us a - non-empty suffix. + Our first application of + break gives us a non-empty suffix. hunk ./en/ch05-fp.xml 213 - case expression. This gives us + &case; expression. This gives us hunk ./en/ch05-fp.xml 215 - suf bound to "bar". We call - splitLines again, this time on + suf bound to "bar". We apply + splitLines recursively, this time on hunk ./en/ch05-fp.xml 226 + + This sort of experimenting with &ghci; is a helpful way to + understand and debug the behavior of a piece of code. It has + an even more important benefit that is almost accidental in + nature. It can be tricky to test complicated code from + &ghci;, so we will tend to write smaller functions. This can + further help the readability of our code. + + This style of creating and reusing small, powerful pieces of + code is a fundamental part of functional programming. + + + A line ending conversion program + + Let's hook our splitLines function + into the little framework we wrote earlier. Make a copy of + the Interact.hs source file; let's call + the new file FixLines.hs. Add the + splitLines function to the new source + file. Since our function must produce a single + String, we must stitch the list of lines back + together. The Prelude provides an + unlines function that concatenates a list + of strings, adding a newline to the end of each. + + &SplitLines.hs:fixLines; + + If we replace the id function with + fixLines, we can compile an executable + that will convert a text file to our system's native line + ending. + + $ ghc --make FixLines +[1 of 1] Compiling Main ( FixLines.hs, FixLines.o ) +Linking FixLines ... + + If you are on a Windows system, find and download a text + file that was created on a Unix system (for example gpl-3.0.txt). + Open it in the standard Notepad text editor. The lines should + all run together, making the file almost unreadable. Process + the file using the FixLines command you + just created, and open the output file in Notepad. The line + endings should now be fixed up. + + On Unix-like systems, the standard pagers and editors hide + Windows line endings. This makes it more difficult to verify + that FixLines is actually eliminating + them. Here are a few commands that should help. + + $ file gpl-3.0.txt +gpl-3.0.txt: ASCII English text +$ unix2dos gpl-3.0.txt +unix2dos: converting file gpl-3.0.txt to DOS format ... +$ file gpl-3.0.txt +gpl-3.0.txt: ASCII English text, with CRLF line terminators + hunk ./en/ch05-fp.xml 288 - Usually, when we define or call a function in + Usually, when we define or apply a function in hunk ./en/ch05-fp.xml 294 - If a function or constructor takes two arguments, we have - the option of using it in infix form, where - we place it between its first and second arguments. This allows - us to write expressions using functions as if they were infix - operators. + If a function or constructor takes two or more + arguments, we have the option of using it in + infix form, where we place it + between its first and second arguments. + This allows us to use functions as infix operators. hunk ./en/ch05-fp.xml 300 - To define or use a function, constructor, or type + To define or apply a function or value constructor hunk ./en/ch05-fp.xml 307 - As a purely syntactic convenience, infix notation - doesn't change a function's behaviour. + Since infix notation is purely a syntactic + convenience, it does not change a function's behavior. hunk ./en/ch05-fp.xml 312 - Infix notation is useful for more than just our own - functions. For example, Haskell's Prelude defines a function, + Infix notation can often help readability. For + instance, the Prelude defines a function, hunk ./en/ch05-fp.xml 315 - present in a list. If we call elem using - prefix notation, it's not difficult to read. + present in a list. If we use elem using + prefix notation, it is fairly easy to read. hunk ./en/ch05-fp.xml 320 - However, if we switch to infix notation, the code - is easier to understand. It's now clearer that we're + If we switch to infix notation, the code + becomes even easier to understand. It is now clearer that we're hunk ./en/ch05-fp.xml 327 - There's no hard-and-fast rule that dictates when you ought - to use infix versus prefix notation, although prefix notation is - far more common. It's best to choose whichever makes your code - more readable in a specific situation. + We see a more pronounced improvement with some useful functions from the + Data.List module. + The isPrefixOf function tells us if one list + matches the beginning of another. + + &infix.ghci:isPrefixOf; + + The isInfixOf and + isSuffixOf functions match anywhere in a list + and at its end, respectively. + + &infix.ghci:isInfixOf; + + There is no hard-and-fast rule that dictates when + you ought to use infix versus prefix notation, although prefix + notation is far more common. It's best to choose whichever + makes your code more readable in a specific situation. hunk ./en/ch05-fp.xml 357 - type is a function. It might be nice if + value is a function. It might be convenient if hunk ./en/ch05-fp.xml 381 - logical home of all standard list functions. The prelude merely + logical home of all standard list functions. The Prelude merely hunk ./en/ch05-fp.xml 383 - Data.List. Several invaluable functions in + Data.List. Several useful functions in hunk ./en/ch05-fp.xml 386 - the sections that follow, we'll explicitly mention those that - are in Data.List. + the sections that follow, we will explicitly mention those that + are only in Data.List. hunk ./en/ch05-fp.xml 400 - The simplest function on a list is - null, which merely tells us whether or - not the list is empty. - - &ch05.list.ghci:null; - hunk ./en/ch05-fp.xml 405 - To get the first element of a list, we use the + If you need to determine whether a list is empty, use the + null function. + + &ch05.list.ghci:null; + + To access the first element of a list, we use the hunk ./en/ch05-fp.xml 431 - None of the above functions is well-behaved on empty - lists, so be careful if you don't know whether or not a list - is empty. What form does their misbehaviour take? + Several of the functions above behave poorly on + empty lists, so be careful if you don't know whether or not a + list is empty. What form does their misbehavior take? hunk ./en/ch05-fp.xml 437 + Try each of the above functions in &ghci;. Which ones + crash when given an empty list? + hunk ./en/ch05-fp.xml 443 - Safely and sanely working with unsafe functions + Safely and sanely working with crashy functions hunk ./en/ch05-fp.xml 446 - safe, where we know that it might blow up + head, where we know that it might blow up hunk ./en/ch05-fp.xml 449 - call safe. Let's construct a hideously + call head. Let's construct an hunk ./en/ch05-fp.xml 454 - If we're coming from a language like Perl or python, this - might seem like a perfectly natural way to write this test. - Behind the scenes, Python lists are arrays; and Perl arrays - are, well, arrays. So they necessarily know how long they - are, and calling len(foo) or $#foo+1 - is a perfectly natural thing to do. But as with many other - things, it's not a good idea to blindly transplant such an - assumption into Haskell. + If we're coming from a language like Perl or + Python, this might seem like a perfectly natural way to write + this test. Behind the scenes, Python lists are arrays; and + Perl arrays are, well, arrays. So they necessarily know how + long they are, and calling len(foo) or + scalar(@foo) is a perfectly natural thing to do. + But as with many other things, it's not a good idea to blindly + transplant such an assumption into Haskell. hunk ./en/ch05-fp.xml 469 - Therefore, when we only care whether or not a list is - empty, calling length isn't a good - strategy. It can potentially do a lot more work than we want, - if the list we're working is finite. Worse, Haskell lets us - define infinitely long lists, on which an unsuspecting call to - length will never return! + Therefore, when we only care whether or not a + list is empty, calling length isn't a + good strategy. It can potentially do a lot more work than we + want, if the list we're working with is finite. Since Haskell lets + us easily create infinite lists, a careless use of + length may even result in an infinite + loop. hunk ./en/ch05-fp.xml 477 - A more appropriate function to call here instead is - null, which runs in constant time. Better - yet, using null makes our code indicate - what property of the list we really care about. + A more appropriate function to call here instead + is null, which runs in constant time. + Better yet, using null makes our code + indicate what property of the list we really care about. Here + are two improved ways of expressing + myDumbExample. + + &EfficientList.hs:mySmartExample; hunk ./en/ch05-fp.xml 524 - lists, all of the same type, and flattens them + lists, all of the same type, and concatenates them hunk ./en/ch05-fp.xml 529 - It only flattens one level of nesting. + It removes one level of nesting. hunk ./en/ch05-fp.xml 542 - (||). + (||), over lists. hunk ./en/ch05-fp.xml 552 - predicate succeeds on any element of the list. + predicate succeeds on at least one element of the list. hunk ./en/ch05-fp.xml 561 - The take function, which we already - met in , returns a - sublist consisting of the first several elements from a list. - Its converse, drop, drops several - elements from the head of the list. + The take function, which we + already met in , returns a + sublist consisting of the first k + elements from a list. Its converse, drop, + drops k elements from the start of the + list. hunk ./en/ch05-fp.xml 570 - The splitAt function combines the - functions of take and - drop, returning a two-tuple of the input - list, split at the given index. + The splitAt function + combines the functions of take and + drop, returning a pair of the input list, + split at the given index. hunk ./en/ch05-fp.xml 579 - takeWhile constructs a list as long as - the predicate returns True, while - dropWhile drops elements from the list as - long as the predicate returns True. + takeWhile takes elements from the + beginning of a list as long as the predicate returns + True, while dropWhile drops + elements from the list as long as the predicate returns + True. hunk ./en/ch05-fp.xml 625 - is using infix notation, which we introduced in . + is using infix notation. hunk ./en/ch05-fp.xml 663 - write functions that take variable numbers of arguments. So + write functions that take variable numbers of + arguments + Unfortunately, we do not have room to address that + challenge in this book. + . So hunk ./en/ch05-fp.xml 679 - linkend="fp.splitlines"/>. It has a standard counterpart, - unlines, which joins a list of lines - together using newline characters, and adds another newline to - the end of the string. + linkend="fp.splitlines"/>, and its standard counterpart, + unlines. Notice that + unlines always places a newline on the + end of its result. hunk ./en/ch05-fp.xml 722 + + + + Using the command framework from , write a program that prints the + first word of each line of its input. + + + + + + Write a program that transposes the text in a file. + For instance, it should convert + "hello\nworld\n" to + "hw\neo\nlr\nll\nod\n". + + hunk ./en/ch05-fp.xml 749 - several possible answers to this question, so let's build up a - toolbox of answers. + several possible answers to this question. hunk ./en/ch05-fp.xml 771 - The C code computes the result incrementally as it - traverses the string; the Haskell code can do the same. - However, in Haskell, we write the loop as a function, which - we'll call loop just to keep things nice - and explicit. + The C code computes the result incrementally as + it traverses the string; the Haskell code can do the same. + However, in Haskell, we can express the equivalent of a loop as a + function. We'll call ours loop just to + keep things nice and explicit. hunk ./en/ch05-fp.xml 798 - An empty list doesn't just mean the input string is - empty; it's also the case we'll encounter when we - traverse all the way to the end of a non-empty list. So we - don't want to error out if we see an empty - list. Instead, we should do something sensible. Here, the - sensible thing is to return our accumulated value. + An empty list doesn't just mean the input + string is empty; it's also the case we'll encounter + when we traverse all the way to the end of a non-empty list. + So we don't want to error out if we see an + empty list. Instead, we should do something sensible. Here, + the sensible thing is to terminate the loop, and return our + accumulated value. hunk ./en/ch05-fp.xml 832 - Sometimes we'll see this idiom extended, such as - foo''. Since keeping track of the number - of single quotes tacked onto the end of a name rapidly - becomes tedious, use of more than two in a row is thankfully - rare. + Sometimes we'll see this idiom extended, such + as foo''. Since keeping track of the + number of single quotes tacked onto the end of a name + rapidly becomes tedious, use of more than two in a row is + thankfully rare. Indeed, even one single quote can be easy + to miss, which can lead to confusion on the part of readers. + It might be better to think of the use of single quotes as a + coding convention that you should be able to recognize, and + less as one that you should actually follow. hunk ./en/ch05-fp.xml 872 - empty) the base case. We'll see people + empty) the base case (sometimes the + terminating case). We'll see people hunk ./en/ch05-fp.xml 879 - Structural induction isn't confined to lists; we can use - it on other algebraic data types, too. We'll have more to say - about it later. + As a useful technique, structural recursion is + not confined to lists; we can use it on other algebraic data + types, too. We'll have more to say about it later. hunk ./en/ch05-fp.xml 889 - some space each time it calls itself, so it knows where to + some space each time it applies itself, so it knows where to hunk ./en/ch05-fp.xml 892 - Clearly, a tail recursive function would be at a huge + Clearly, a recursive function would be at a huge hunk ./en/ch05-fp.xml 894 - every recursive call: this would require linear space + every recursive application: this would require linear space hunk ./en/ch05-fp.xml 896 - implementations usually detect uses of tail recursion, and + implementations detect uses of tail recursion, and hunk ./en/ch05-fp.xml 899 - optimisation. + optimisation, abbreviated TCO. hunk ./en/ch05-fp.xml 901 - Few imperative language implementations perform tail - call optimisation; this is why writing imperative code in - any kind of ambitiously functional style often leads to + Few imperative language implementations + perform TCO; this is why using any kind of ambitiously + functional style in an imperative language often leads to hunk ./en/ch05-fp.xml 928 - empty list. The second equations ensures that + empty list. The second equation ensures that hunk ./en/ch05-fp.xml 996 - value of another type, b. This - is the only unfamiliar piece of notation in the type; notice - the parentheses that surround the signature of the function - argument so we (and Haskell) won't misread it. + value of another type, b. hunk ./en/ch05-fp.xml 1031 - in the second equation is thus necessarily the empty list - constructor, so there's no point in doing a pattern match to + in the second equation is necessarily the empty list + constructor, so there's no need to perform a match to hunk ./en/ch05-fp.xml 1034 + + As a matter of style, it is fine to use wild cards for + well known simple types like lists and Maybe. + For more complicated or less familiar types, it can be safer + and more readable to name constructors explicitly. hunk ./en/ch05-fp.xml 1047 - This business of seeing that we're repeating an idiom, - then abstracting it so we can reuse (and write less!) code, is - a common aspect of Haskell programming. + This pattern of spotting a repeated idiom, then + abstracting it so we can reuse (and write less!) code, is a + common aspect of Haskell programming. While abstraction isn't + unique to Haskell, higher order functions make it remarkably + easy. hunk ./en/ch05-fp.xml 1057 - Another common operation on a sequence of data is to comb - through it for elements that satisfy some criterion. Here's - an example in C++ of a function that walks a linked list of - numbers and returns those that are odd. - - &filter.cpp:oddList; - - Our Haskell equivalent has a recursive case that's a bit - more complex than our earlier functions: it only puts a number - in the list it returns if the number is odd. Using a guard - expresses this nicely. + Another common operation on a sequence of data + is to comb through it for elements that satisfy some + criterion. Here's a function that walks a list of numbers and + returns those that are odd. Our code has a recursive case + that's a bit more complex than our earlier functions: it only + puts a number in the list it returns if the number is odd. + Using a guard expresses this nicely. hunk ./en/ch05-fp.xml 1071 - Once again, this idiom is so common that Haskell's prelude - defines a function, filter, which removes - the need for boilerplate code to recurse over the list. + Once again, this idiom is so common that the + Prelude defines a function, filter, which + we have already introduced. It removes the need for + boilerplate code to recurse over the list. hunk ./en/ch05-fp.xml 1078 - The filter function takes a predicate - (a function that tests an argument and returns a - Bool) and applies it to every element in its - input list, returning a list of only those for which the - predicate evaluates to True. - - We'll be discussing filter again - soon, in . + The filter function takes a + predicate and applies it to every element in its input list, + returning a list of only those for which the predicate + evaluates to True. We'll revisit + filter again soon, in . hunk ./en/ch05-fp.xml 1089 - Another common thing to do with a loop is to fold - it up. A simple example of this is summing the - values of a list. + Another common thing to do with a collection is + reduce it to a single value. A simple example of this is + summing the values of a list. hunk ./en/ch05-fp.xml 1109 - unoptimised Java implementation. + unoptimised Java implementation. (It's safe to skip it if you + don't read Java.) hunk ./en/ch05-fp.xml 1114 - Although Adler-32 is a simple checksum, this code isn't - particularly easy to read on account of the bit-twiddling - involved. Can we do any better with a Haskell + Although Adler-32 is a simple checksum, this + code isn't particularly easy to read on account of the + bit-twiddling involved. Can we do any better with a Haskell hunk ./en/ch05-fp.xml 1121 - This code isn't exactly easier to follow than the Java code, - but let's look at what's going on. Once again, - helper function is tail recursive. We've - turned the two variables we updated on every loop iteration in - Java into accumulator parameters. When our recursion - terminates on the end of the input list, we compute our - checksum and return it. + This code isn't exactly easier to follow than + the Java code, but let's look at what's going on. First of + all, we've introduced some new functions. The + shiftL function implements a logical + shift left; (.&.) provides bitwise + and; and (.|.) provides + bitwise or. + + Once again, our helper function is + tail recursive. We've turned the two variables we updated on + every loop iteration in Java into accumulator parameters. + When our recursion terminates on the end of the input list, we + compute our checksum and return it. hunk ./en/ch05-fp.xml 1135 - If we take a step back, we can restructure our Haskell - adler32 to more closely resemble our - earlier mySum function. Instead of two - accumulator parameters, we can use a single accumulator that's - a two-tuple. + If we take a step back, we can restructure our + Haskell adler32 to more closely resemble + our earlier mySum function. Instead of + two accumulator parameters, we can use a pair as the + accumulator. hunk ./en/ch05-fp.xml 1143 - Why would we want to make this seemingly meaningless - structural change? Because as we've already seen with - map and filter, we - can extract the common behaviour shared by + Why would we want to make this seemingly + meaningless structural change? Because as we've already seen + with map and filter, + we can extract the common behavior shared by hunk ./en/ch05-fp.xml 1149 - function. We can describe this behaviour as do + function. We can describe this behavior as do hunk ./en/ch05-fp.xml 1168 - stepper function, an initial value for its - accumulator, and a list. The stepper takes an + step function, an initial value for its + accumulator, and a list. The step takes an hunk ./en/ch05-fp.xml 1192 - Notice how much simpler this code is than our ` + Notice how much simpler this code is than our hunk ./en/ch05-fp.xml 1204 - out each step in the evaluation of step function - when we call niceSum [1,2,3]. + out each step in its evaluation when we call niceSum + [1,2,3]. hunk ./en/ch05-fp.xml 1209 - We can rewrite adler32_try2 in a - similar way, using foldl to let us focus - on the details that are important. + We can rewrite adler32_try2 + using foldl to let us focus on the + details that are important. hunk ./en/ch05-fp.xml 1215 - Here, our accumulator is a two-tuple, so the result of + Here, our accumulator is a pair, so the result of hunk ./en/ch05-fp.xml 1229 - predictable behaviour. + predictable behavior. hunk ./en/ch05-fp.xml 1231 - This means that a reader with a little experience will - have an easier time understanding a function that uses a fold - than one that uses explicit recursion. Where a fold isn't - going to produce any surprises, the behaviour of a function + This means that a reader with a little + experience will have an easier time understanding a use of a + fold than code that uses explicit recursion. A fold isn't + going to produce any surprises, but the behavior of a function hunk ./en/ch05-fp.xml 1239 - This line of reasoning applies to other higher-order - library functions, including those we've already seen, - map and filter. - Because they're library functions with well-defined behaviour, - we only need to learn what they do once, and we'll have an - advantage when we need to understand any code that uses - them. - - - - Avoiding multiple traversals of a list - - From looking at adler32_foldl, we - know that we can accumulate more than one value at a time when - we fold over a list. Here's another use for a fold: - optimising code by avoiding multiple traversals of a - list. - - Let's consider the problem of finding the root mean square - of a list of numbers: compute the sum of the squares of every - element in the list, divide by its length, then compute the - square root of that number. In an imperative language like C, - we wouldn't even think twice about writing code like - this. - - &rms.c:rootMeanSquare; - - Clearly, we're looping over the list just once, updating - the accumulator values mean_square and - length as we go. - - Meanwhile, over in functional programming land, the - temptation is strong to turn our verbal description of the - root mean square into code. - - &rms.hs:rootMeanSquare; - - This is a lovely, compact translation of the verbal - description. It even uses our new friend, the - map function, to make the code clearer by - avoiding explicit recursion, but it's not necessarily good - code. The calls to map and - length are each going to traverse the - input list once. - - On a small list, the cost of traversing it twice obviously - won't matter, but on a big list, we're likely to notice. We - can use a fold to avoid this need to traverse the list - twice. - - &rms.hs:rootMeanSquare_foldl; - - Clearly, this code isn't as readable as the earlier - version that used map and - length. Which version should we prefer? - It's often best to start out by writing the most readable - code, since we can make that correct most quickly, and put off - worrying about transforming it into something faster until - much later, when we have profiling data for our program. Only - if those numbers indicate a performance problem should we - worry about stepping back in and transforming our code. We'll - have much more to say about profiling, performance, and - optimisation later, in chapter XXX. + This line of reasoning applies to other + higher-order library functions, including those we've already + seen, map and + filter. Because they're library functions + with well-defined behavior, we only need to learn what they do + once, and we'll have an advantage when we need to understand + any code that uses them. These improvements in readability + also carry over to writing code. Once we start to think with + higher order functions in mind, we'll produce concise code + more quickly. hunk ./en/ch05-fp.xml 1252 - Folding from the right and primitive recursion + Folding from the right hunk ./en/ch05-fp.xml 1267 - The difference between foldl and - foldr should be clear from looking at - where the parentheses and the empty list - elements show up. With foldl, the empty - list element is on the left, and all the parentheses group to - the left. With foldr, the empty list - element is on the right, and the parentheses group to the + The difference between + foldl and foldr + should be clear from looking at where the parentheses and the + empty list elements show up. With + foldl, the empty list element is on the + left, and all the parentheses group to the left. With + foldr, the zero value + is on the right, and the parentheses group to the hunk ./en/ch05-fp.xml 1277 - In fact, there's a lovely intuitive explanation of how + There is a lovely intuitive explanation of how hunk ./en/ch05-fp.xml 1279 - with the empty list element, and every constructor in the list - with an application of the step function. + with the zero value, and every constructor + in the list with an application of the step function. hunk ./en/ch05-fp.xml 1301 - headache, so let's examine it a little depth. Like + headache, so let's examine it in a little depth. Like hunk ./en/ch05-fp.xml 1336 - If you want to understand the definition of - foldl using foldr, - it's best to have the following tools at hand: some headache - pills, a glass of water, &ghci; (so you can find out what - the id function does), and a pencil and + If you want to set yourself a solid challenge, + try to follow the above definition of + foldl using foldr. + Be warned: this is not trivial! You might want to have the + following tools at hand: some headache pills and a glass of + water, &ghci; (so that you can find out what the + id function does), and a pencil and hunk ./en/ch05-fp.xml 1345 - All you'll need to do is follow the same manual - evaluation steps as we did to see what + You will want to follow the same manual + evaluation process as we outlined above to see what hunk ./en/ch05-fp.xml 1348 - were really doing. + were really doing. If you get stuck, you may find the task + easier after reading . hunk ./en/ch05-fp.xml 1352 - While we can write foldl in terms of - foldr, we can't do the converse: - foldr is more basic than - foldl. This should make it clearer why - we call functions written with foldr - primitive recursive. - - (By the way, don't feel like you have to go to special - lengths to remember the term primitive - recursive. It's just useful to remember that you - read about it somewhere, and that it has something to do with - foldr.) - hunk ./en/ch05-fp.xml 1371 - If foldr replaces the end of a list - with some other value, this gives us an easy way to think - about Haskell's list append function, + If foldr replaces the end + of a list with some other value, this gives us another way to + look at Haskell's list append function, hunk ./en/ch05-fp.xml 1388 - Here, we're replacing each list constructor with another - list constructor, but we're replacing the empty list with the + Here, we replace each list constructor with another + list constructor, but we replace the empty list with the hunk ./en/ch05-fp.xml 1392 - As our extended treatment of folds should indicate, the - foldr function is nearly as important a - member of our list-programming toolbox as the more basic list - functions we saw in . + As our extended treatment of folds should + indicate, the foldr function is nearly as + important a member of our list-programming toolbox as the more + basic list functions we saw in . It can consume and produce a list + incrementally, which makes it useful for writing lazy data + processing code. hunk ./en/ch05-fp.xml 1402 - A final note about foldl + Left folds, laziness, and space leaks hunk ./en/ch05-fp.xml 1406 - However, any time you want to fold from the left in practice, - use foldl' from the - Data.List module instead, because it's more - efficient. You should take this on faith for now; we'll - explain why you should avoid plain foldl - in normal use in section XXX. + This is convenient for testing, but we will never use + foldl in practice. + + The reason has to do with Haskell's non-strict evaluation. + If we apply foldl (+) [1,2,3], it evaluates to + the expression (((0 + 1) + 2) + 3). We can see + this occur if we revisit the way in which the function gets + expanded. + + &Fold.hs:foldl.expand.noid; + + The final expression will not be evaluated to + 6 until its value is demanded. Before it is + evaluated, it must be stored as a thunk. Not surprisingly, a + thunk is more expensive to store than a single number, and the + more complex the thunked expression, the more space it needs. + For something cheap like arithmetic, thunking an expresion is + more computationally expensive than evaluating it immediately. + We thus end up paying both in space and in time. + + When &GHC; is evaluating a thunked expression, it uses an + internal stack to do so. Because a thunked expression could + potentially be infinitely large, &GHC; places a fixed limit on + the maximum size of this stack. Thanks to this limit, we can + try a large thunked expression in &ghci; without needing to + worry that it might consume all of memory. + + &thunky.ghci:sum; + + From looking at the expansion above, we can surmise that + this creates a thunk that consists of 1000 integers and 999 + applications of (+). That's a lot of + memory and effort to represent a single number! With a larger + expression, although the size is still modest, the results are + more dramatic. + + &thunky.ghci:overflow; + + On small expressions, foldl will work + correctly but slowly, due to the thunking overhead that it + incurs. We refer to this invisible thunking as a + space leak, because our code is operating + normally, but using far more memory than it should. + + On larger expressions, code with a space leak will simply + fail, as above. A space leak with foldl is a + classic roadblock for new Haskell programmers. Fortunately, + this is easy to avoid. + + The Data.List module defines a function named + foldl' that is similar to + foldl, but does not build up thunks. The + difference in behavior between the two is immediately + obvious. + + &thunky.ghci:strict; + + Due to the thunking behavior of + foldl, it is wise to avoid this function + in real programs: even if it doesn't fail outright, it will be + unnecessarily inefficient. Instead, import + Data.List and use + foldl'. hunk ./en/ch05-fp.xml 1523 - The Prelude function takeWhile - has the following type. - - &ch05.exercises.hs:takeWhile; - - Use &ghci; to figure out what - takeWhile does. Write your own - definitions, first using explicit recursion, then + Write your own definition of the standard + takeWhile function, first using + explicit recursion, then hunk ./en/ch05-fp.xml 1546 - How many of the following standard prelude functions - can you rewrite using list folds? + How many of the following Prelude + functions can you rewrite using list folds? + + + + any + cycle + words + unlines + + + + For those functions where you can use either + foldl' or + foldr, which is more + appropriate in each case? hunk ./en/ch05-fp.xml 1586 - Haskell lets us write completely anonymous functions, which - we can use to avoid the need to give names to our helper - functions. Anonymous functions are often called + Haskell lets us write completely anonymous + functions, which we can use to avoid the need to give names to + our helper functions. Anonymous functions are often called hunk ./en/ch05-fp.xml 1591 - backslash character, \. This is followed by the - function's arguments (which can include patterns), then an arrow + backslash character, \, pronounced lambda + The backslash was chosen for its visual + resemblance to the Greek letter lambda, + λ. Although &GHC; can accept Unicode + input, it correctly treats λ as a + letter, not as a synonym for \. + . This is followed by the function's + arguments (which can include patterns), then an arrow hunk ./en/ch05-fp.xml 1610 - Anonymous functions behave in every respect identically to - functions that have names, but Haskell places a few important on - how we can define them. Most importantly, whereas we can write - a normal function using multiple clauses containing different - patterns and guards, a lambda can only have a single clause in - its definition. + Anonymous functions behave in every respect + identically to functions that have names, but Haskell places a + few important restrictions on how we can define them. Most + importantly, while we can write a normal function using + multiple clauses containing different patterns and guards, a + lambda can only have a single clause in its definition. hunk ./en/ch05-fp.xml 1625 - we have to be sure that any patterns we use will match. + we must be certain that any patterns we use will match. hunk ./en/ch05-fp.xml 1661 - A lambda has two other disadvantages: it is both less - expressive and more brittle than a top-level or local function. - We can use patterns in the parameters of a lambda, but we can't - use guards, and we can't define a lambda using a series of - equations. If any pattern on the left hand side of a lambda - fails to match, we'll get a runtime pattern matching - error. - hunk ./en/ch05-fp.xml 1671 - In Haskell, a single piece of syntax doesn't often get - pressed into use for multiple tasks. So why does the - -> arrow get used for what looks like two - purposes in the type signature of a function? + You may wonder why the -> arrow is + used for what seems to be two purposes in the type signature of + a function. hunk ./en/ch05-fp.xml 1677 - It looks like the -> is separating the - arguments to dropWhile from each other, but - also that it separates the arguments from the return type. But - in fact -> has only one meaning: it + It looks like the -> is separating + the arguments to dropWhile from each other, + but that it also separates the arguments from the return type. + But in fact -> has only one meaning: it hunk ./en/ch05-fp.xml 1684 - The implication here is that in Haskell, all functions take - only one argument. While dropWhile + The implication here is very important: in + Haskell, all functions take only one + argument. While dropWhile hunk ./en/ch05-fp.xml 1688 - arguments, it only takes one. Here's a perfectly valid Haskell - expression. + arguments, it is actually a function of one argument, which + returns a function that takes one argument. Here's a perfectly + valid Haskell expression. hunk ./en/ch05-fp.xml 1694 - What type does it have, and what does it do? + Well, that looks useful. The + value dropWhile isSpace is a function that strips + leading white space from a string. How is this useful? As one + example, we can use it as an argument to a higher order + function. hunk ./en/ch05-fp.xml 1700 - &ch05.list.ghci:dropWhile.isSpace.type; - - Well, that looks useful. The value - dropWhile isSpace is a function that strips leading - white space from a string. + &ch05.list.ghci:dropWhile.isSpace.use; hunk ./en/ch05-fp.xml 1702 - Every time we give an argument to a function, we can + Every time we supply an argument to a function, we can hunk ./en/ch05-fp.xml 1710 - If we call zip3 with just one argument, - we get a function that accepts two arguments, where the first - argument is now fixed. No matter what - second or third arguments we pass, the first argument to this - new function will always be the fixed value we already + If we apply zip3 with just + one argument, we get a function that accepts two arguments. No + matter what arguments we supply to this compound function, its + first argument will always be the fixed value we hunk ./en/ch05-fp.xml 1726 - signatures of the two and their behaviour are identical. + signatures of the two and their behavior are identical. hunk ./en/ch05-fp.xml 1733 - Partial function application lets us avoid writing tiresome - throwaway functions. It's generally a lot better for this + Partial function application lets us avoid writing + tiresome throwaway functions. It's often more useful for this hunk ./en/ch05-fp.xml 1738 - how we'd write it to use a partially applied function instead of - a named helper function or a lambda. + how we'd use a partially applied function instead of a named + helper function or a lambda. hunk ./en/ch05-fp.xml 1748 - that has exactly the same type and behaviour as the helper and + that has exactly the same type and behavior as the helper and hunk ./en/ch05-fp.xml 1751 + Partial function application is named + currying, after the logician Haskell + Curry (for whom the Haskell language is named). + hunk ./en/ch05-fp.xml 1759 - &Sum.hs:niceSum.noid; + &Sum.hs:niceSum.noid; hunk ./en/ch05-fp.xml 1761 - We don't need to fully apply foldl; we - can omit the list xs from both the parameter - list and the parameters to foldl, and we'll - end up with a more compact function that has the same - type. + We don't need to fully apply + foldl; we can omit the list + xs from both the parameter list and the + parameters to foldl, and we'll end up with + a more compact function that has the same type. hunk ./en/ch05-fp.xml 1773 - write partially applied functions using infix operators. If + write a partially applied function in infix style. If hunk ./en/ch05-fp.xml 1797 - Using this as an argument to any, we - get a function that checks an entire string to see if it's all - lowercase. + Using this as an argument to + all, we get a function that checks an + entire string to see if it's all lowercase. hunk ./en/ch05-fp.xml 1802 + + If we use this style, we can further improve the + readability of our earlier isInAny3 + function. + + &Partial.hs:isInAny4; + hunk ./en/ch05-fp.xml 1813 - As-patterns and function composition + As-patterns hunk ./en/ch05-fp.xml 1815 - Haskell's tails function, in the - Data.List module, generalises the - tail function we introduced earlier. It - successively applies tail to its input, - then calls itself on the result, until there's nothing - left. + Haskell's tails function, in + the Data.List module, generalises the + tail function we introduced earlier. + Instead of returning one tail of a list, it + returns all of them. hunk ./en/ch05-fp.xml 1826 - end. In fact, it always produces that extra empty list, even + end. It always produces that extra empty list, even hunk ./en/ch05-fp.xml 1834 - to write our own version by hand. + to write our own version by hand. We'll use a new piece of + notation, the @ symbol. hunk ./en/ch05-fp.xml 1839 - Let's try out that definition. + The pattern xs@(_:xs') is called an + as-pattern, and it means bind the + variable xs to the value that matches the + right side of the @ symbol. + + In our example, if the pattern after the + @ matches, xs will be bound to + the entire list that matched, and xs' to all + but the head of the list (we used the wild card _ + pattern to indicate that we're not interested in the value of + the head of the list). hunk ./en/ch05-fp.xml 1853 - - Where did that at-sign come from? + The as-pattern makes our code more readable. To + see how it helps, let us compare a definition that lacks an + as-pattern. hunk ./en/ch05-fp.xml 1857 - You may have noticed the funny-looking pattern - xs@(_:xs') in our definition of - suffixes. This is called an - as-pattern, and it means bind the - variable xs to the expression in the - matched pattern (_:xs'). + &SuffixTree.hs:noAsPattern; hunk ./en/ch05-fp.xml 1859 - In this case, if the pattern after the @ - matches, xs will be bound to the entire - list that matched, and xs' to the rest of - the list (we used the wild card _ pattern to - indicate that we're not interested in the value of the head of - the list). + Here, the list that we've deconstructed in the pattern + match just gets put right back together in the body of the + function. hunk ./en/ch05-fp.xml 1863 - Let's look at a second definition of the - suffixes function, only this time without - using an as-pattern. + As-patterns have a more practical use than simple + readability: they can help us to share data instead of copying + it. In our definition of noAsPattern, when + we match (x:xs), we construct a new copy of it in + the body of our function. This causes us to allocate a new + list node at run time. That may be cheap, but it isn't free. + In contrast, when we defined suffixes, we + reused the value xs that we matched with our + as-pattern. Since we reuse an existing value, we avoid a little + allocation. + hunk ./en/ch05-fp.xml 1875 - &SuffixTree.hs:noisier; + + Code reuse through composition hunk ./en/ch05-fp.xml 1878 - Here, the list that we've deconstructed in the pattern - match just gets put right back together in the body of the - function. The as-pattern makes the code more readable by - letting us avoid the need to repeat ourselves. + It seems a shame to introduce a new function, + suffixes, that does almost the same thing + as the existing tails function. Surely we + can do better? hunk ./en/ch05-fp.xml 1883 - - As-patterns are not just for readability + Recall the init function we + introduced in : it returns all but + the last element of a list. hunk ./en/ch05-fp.xml 1887 - Not only can as-patterns reduce the clutter in our code; - they can help it to execute more efficiently, too. Since we - don't have much of a mental model for thinking about - evaluation and efficiency yet, we'll return to this topic - later, in XXX. - - + &SuffixTree.hs:suffixes2; + + This suffixes2 function + behaves identically to suffixes, but it's a + single line of code. hunk ./en/ch05-fp.xml 1893 - - Code reuse + &suffix.ghci:suffixes2; hunk ./en/ch05-fp.xml 1895 - It seems a shame to introduce a new function, - suffixes, that does almost the same thing - as the existing tails function. Surely - we can do better? + If we take a step back, we see the glimmer of a + pattern here: we're applying a function, then applying another + function to its result. Let's turn that pattern into a function + definition. hunk ./en/ch05-fp.xml 1900 - Remember the init function we - introduced in ? + &SuffixTree.hs:compose; hunk ./en/ch05-fp.xml 1902 - &SuffixTree.hs:suffixes2; + We now have a function, + compose, that we can use to + glue two other functions together. hunk ./en/ch05-fp.xml 1906 - The suffixes2 function behaves - identically to suffixes, but it's a - single line of code. + &SuffixTree.hs:suffixes3; hunk ./en/ch05-fp.xml 1908 - &suffix.ghci:suffixes2; + Haskell's automatic currying lets us drop the + xs variable, so we can make our definition + even shorter. hunk ./en/ch05-fp.xml 1912 - If we take a step back, we see the glimmer of a pattern - here: we're calling a function, then applying another function - to its result. Let's turn that pattern into a function - definition. + &SuffixTree.hs:suffixes4; hunk ./en/ch05-fp.xml 1914 - &SuffixTree.hs:compose; + Fortunately, we don't need to write our own + compose function. Plugging functions into + each other like this is so common that the Prelude provides + function composition via the (.) + operator. hunk ./en/ch05-fp.xml 1920 - We now have a function, compose, that - we can use to glue two other functions - together. + &SuffixTree.hs:suffixes5; hunk ./en/ch05-fp.xml 1922 - &SuffixTree.hs:suffixes3; + The (.) operator isn't a + special piece of language syntax; it's just a normal + operator. hunk ./en/ch05-fp.xml 1926 - As Haskell's automatic currying lets us drop the - xs variable, we can make our definition - even shorter. + &suffix.ghci:types; hunk ./en/ch05-fp.xml 1928 - &SuffixTree.hs:suffixes4; + We can create new functions at any time by writing + chains of composed functions, stitched together with + (.), so long (of course) as the result type + of the function on the right of each (.) + matches the type of parameter that the function on the left can + accept. hunk ./en/ch05-fp.xml 1935 - Fortunately, we don't need to write our own - compose function. Plugging functions - into each other like this is so common that Haskell's prelude - provides the (.) operator to denote - function composition. + As an example, let's solve a simple puzzle: counting the + number of words in a string that begin with a capital + letter. hunk ./en/ch05-fp.xml 1939 - &SuffixTree.hs:suffixes5; + &suffix.ghci:dotty; hunk ./en/ch05-fp.xml 1941 - The (.) operator isn't a special - piece of language syntax; it's just a normal operator. + We can understand what this composed function does by + examining its pieces. The (.) function is + right associative, so we will proceed from right to left. hunk ./en/ch05-fp.xml 1945 - &suffix.ghci:types; + &suffix.ghci:words.type; hunk ./en/ch05-fp.xml 1947 - We can create new functions at any time by writing chains - of composed functions, stitched together with - (.), so long (of course) as the result - type of the function on the right of each - (.) matches the type of parameter that - the function on the left can accept. + The words function has a result type of + [String], so whatever is on the left side of + (.) must accept a compatible + argument. hunk ./en/ch05-fp.xml 1952 - &suffix.ghci:dotty; + &suffix.ghci:capped.type; hunk ./en/ch05-fp.xml 1954 - Here's an example drawn from a piece of code I wrote the - day before I started on this section. I wanted to get a list - of C preprocessor definitions from a header file shipped with - libpcap, a popular network packet filtering - library. + This function returns True if a word begins + with a capital letter (try it in &ghci;), so filter + (isUpper . head) returns a list of Strings + containing only words that begin with capital letters. + + &suffix.ghci:capfilt.type; + + Since this expression returns a list, all that remains is + calculate the length of the list, which we do with another + composition. + + Here's another example, drawn from a real + application. We want to extract a list of macro names from a C + header file shipped with libpcap, a popular network + packet filtering library. The header file contains a large + number definitions of the following form. + + +#define DLT_EN10MB 1 /* Ethernet (10Mb) */ +#define DLT_EN3MB 2 /* Experimental Ethernet (3Mb) */ +#define DLT_AX25 3 /* Amateur Radio AX.25 */ + + Our goal is to extract names such as DLT_EN10MB + and DLT_AX25. hunk ./en/ch05-fp.xml 1981 - We take an entire file, split it up with - lines, then call foldr step - [] on the result. Since I know, based on the type of - lines, that I'm folding over a list of - strings, the step helper function must - thus operate on individual lines. + We treat an entire file as a string, split it up + with lines, then apply foldr step + [] to the resulting list of lines. The + step helper function operates on a single + line. hunk ./en/ch05-fp.xml 1989 - If we match a macro definition, we cons the name of the - macro onto the head of the list we're returning; otherwise, we - do nothing with the list on this invocation. + If we match a macro definition with our guard + expression, we cons the name of the macro onto the head of the + list we're returning; otherwise, we leave the list + untouched. hunk ./en/ch05-fp.xml 1994 - We can see heavy use of function composition in the body - of step. While all of these functions - are by now familiar to us, it can take a little practice to - glue together the sequence of types in a chain of compositions - like this. Let's walk through the procedure by hand. + While the individual functions in the body of + secondWord are by now familiar to us, it + can take a little practice to piece together a chain of + compositions like this. Let's walk through the + procedure. hunk ./en/ch05-fp.xml 2000 - The first call is to words. + Once again, we proceed from right to left. The + first function is words. hunk ./en/ch05-fp.xml 2005 - We then call the partially applied function drop - 1 on the result of words. + We then apply tail to the result of + words. hunk ./en/ch05-fp.xml 2010 - See how naturally partial application fits in here? It's - given us a function that turns a list into another list. - Composing these, we match up the result of - words with the parameter of drop - 1. - - &dlts.ghci:drop1.words; - - Finally, calling head on the result - of drop 1 . words will give us what we want: the - name of the macro we're defining. + Finally, applying head to the + result of drop 1 . words will give us the name of + our macro. hunk ./en/ch05-fp.xml 2016 - - Use your head wisely + + Use your head wisely hunk ./en/ch05-fp.xml 2019 - After warning against unsafe list functions in , here we are calling - head, one of those unsafe list - functions. What gives? + After warning against unsafe list functions in + , here we are calling both + head and tail, two + of those unsafe list functions. What gives? hunk ./en/ch05-fp.xml 2025 - In this case, we can reassure ourselves that we're safe - from a runtime failure in the call to - head. The pattern guard in the - definition of step ensures that after - we call words on any string that makes - it past the guard, we'll have a list of at least two - elements, "#define" and some macro beginning - with "DLT_". You can see this in some of the - &ghci; code snippets above. + In this case, we can assure ourselves by + inspection that we're safe from a runtime failure. The + pattern guard in the definition of step + contains two words, so when we apply + words to any string that makes it past + the guard, we'll have a list of at least two elements, + "#define" and some macro beginning with + "DLT_". hunk ./en/ch05-fp.xml 2034 - This is an example of the kind of reasoning we ought to - do to convince ourselves that our code won't explode when we - call partial functions. Don't forget our earlier - admonition: calling unsafe functions like this requires - care, and can often make our code more fragile in subtle - ways. - + This the kind of reasoning we ought to do to + convince ourselves that our code won't explode when we call + partial functions. Don't forget our earlier admonition: + calling unsafe functions like this requires care, and can + often make our code more fragile in subtle ways. If we for + some reason modified the pattern guard to only contain one + word, we could expose ourselves to the possibility of a crash, + as the body of the function assumes that it will receive two + words. hunk ./en/ch05-fp.xml 2054 - Many tail recursive functions are better expressed using - list manipulation functions like map, - take, and filter. - Without a doubt, it takes some practice to get used to using - these. What we get in return for our initial investment in - learning to use these functions is the ability to skim more - easily over code that uses them. + Many list manipulation operations can be most + easily expressed using combinations of library functions such as + map, take, and + filter. Without a doubt, it takes some + practice to get used to using these. In return for our initial + investment, we can write and read code more quickly, and with + fewer bugs. hunk ./en/ch05-fp.xml 2062 - The reason for this is simple. A tail recursive function - definition has the same problem as a loop in an imperative - language: it's completely general, so we have to look at the - exact details of every loop, and every tail recursive function, - to see what it's really doing. In contrast, + The reason for this is simple. A tail recursive + function definition has the same problem as a loop in an + imperative language: it's completely general. It might perform + some filtering, some mapping, or who knows what else. We are + forced to look in detail at the entire definition of the + function to see what it's really doing. In contrast, hunk ./en/ch05-fp.xml 2069 - functions do only one thing; we can take + functions do only one thing. We can take hunk ./en/ch05-fp.xml 2074 - In the middle ground between tail recursive functions (with - complete generality) and our toolbox of list manipulation - functions (each of which does one thing) lie the folds. A fold - takes more effort to understand than, say, a composition of - map and filter that - does the same thing, but at the same time it behaves more - regularly and predictably than a tail recursive function. As a - general rule, don't use a fold if you don't need one, but think - about using one instead of a tail recursive loop if you - can. + In the middle ground between tail recursive + functions (with complete generality) and our toolbox of list + manipulation functions (each of which does one thing) lie the + folds. A fold takes more effort to understand than, say, a + composition of map and + filter that does the same thing, but it + behaves more regularly and predictably than a tail recursive + function. As a general rule, don't use a fold if you can + compose some library functions, but otherwise try to use a fold + in preference to a hand-rolled a tail recursive loop. hunk ./en/ch05-fp.xml 2085 - As for anonymous functions, they tend to interrupt the - flow of reading a piece of code. It is very + As for anonymous functions, they tend to interrupt + the flow of reading a piece of code. It is very hunk ./en/ch05-fp.xml 2090 - advantages of a named function are twofold: we're not confronted - with the need to understand the function's definition when we're - reading the code that uses it; and a well chosen function name - acts as a tiny piece of local documentation. + advantages of a named function are twofold: we don't need to + understand the function's definition when we're reading the code + that uses it; and a well chosen function name acts as a tiny + piece of local documentation. + + + + Space leaks and strict evaluation + + The foldl function that we discussed + earlier is not the only place where space leaks can arise in + Haskell code. We will use it to illustrate how non-strict + evaluation can sometimes be problematic, and how to solve the + difficulties that can arise. + + + It is perfectly reasonable to skip this section until you + encounter a space leak in the wild. Provided + you use foldr if you are generating a + list, and foldl' instead of + foldl otherwise, space leaks are unlikely + to bother you in practice for a while. + + + + Avoiding space leaks with seq + + We refer to an expression that is not evaluated lazily as + strict, so foldl' is + a strict left fold. It bypasses Haskell's usual non-strict + evaluation through the use of a special function named + seq. + + &Fold.hs:strict; + + This seq function has a peculiar + type, hinting that it is not playing by the usual + rules. + + &thunky.ghci:seq; + + It operates as follows: when a seq + expression is evaluated, it forces its first argument to be + evaluated, then returns its second argument. It doesn't + actually do anything with the first argument: + seq exists solely as a way to force that + value to be evaluated. Let's walk through a brief application + to see what happens. + + &Fold.hs:step1; + + This expands as follows. + + &Fold.hs:step2; + + The use of seq forcibly evaluates + new to 3, and returns its + second argument. + + &Fold.hs:step3; + + We end up with the following result. + + &Fold.hs:step4; + + Thanks to seq, there are no thunks in + sight. + + + + Learning to use seq + + Without some direction, there is an element of mystery + to using seq effectively. + Here are some useful rules for using it well. + + To have any effect, a seq + expression must be the first thing evaluated in an + expression. + + &Fold.hs:hiddenInside; + + To strictly evaluate several values, chain + applications of seq together. + + &Fold.hs:chained; + + A common mistake is to try to use seq + with two unrelated expressions. + + &Fold.hs:badExpression; + + Here, the apparent intention is to evaluate step + zero x strictly. Since the expression is duplicated + in the body of the function, strictly evaluating the first + instance of it will have no effect on the second. The use of + &let; from the definition of foldl' above + shows how to achieve this effect correctly. + + When evaluating an expression, seq + stops as soon as it reaches a constructor. For simple types + like numbers, this means that it will evaluate them + completely. Algebraic data types are a different story. + Consider the value (1+2):(3+4):[]. If we apply + seq to this, it will evaluate the + (1+2) thunk. Since it will stop when it reaches + the first (:) constructor, it will have no effect + on the second thunk. The same is true for tuples: seq + ((1+2),(3+4)) True will do nothing to the thunks + inside the pair, since it immediately hits the pair's + constructor. + + If necessary, we can use normal functional programming + techniques to work around these limitations. + + &Fold.hs:strictPair; + + It is important to understand that + seq isn't free: it has to perform a check + at runtime to see if an expression has been evaluated. Use it + sparingly. For instance, while our + strictPair function evaluates the + contents of a pair up to the first constructor, it adds the + overheads of pattern matching, two applications of + seq, and the construction of a new tuple. + If we were to measure its performance in the inner loop of a + benchmark, we might find it to slow the program down. + + Aside from its performance cost if overused, + seq is not a miracle cure-all for memory + consumption problems. Just because you + can evaluate something strictly doesn't + mean you should. Careless use of + seq may do nothing at all; move existing + space leaks around; or introduce new leaks. + + The best guides to whether seq is + necessary, and how well it is working, are performance + measurement and profiling, which we will cover in . From a base of empirical + measurement, you will develop a reliable sense of when + seq is most useful. + hunk ./en/ch08-io.xml 1073 - linkend="fp.aspattern.codereuse"/>. + linkend="fp.compose"/>. hunk ./en/ch26-profiling.xml 3 - + hunk ./examples/ch05/Adler32.hs 18 - where helper (a,b) (x:xs) = let a' = (a + (ord x .&. 0xff)) `mod` base - in helper (a', (a' + b) `mod` base) xs + where helper (a,b) (x:xs) = + let a' = (a + (ord x .&. 0xff)) `mod` base + b' = (a' + b) `mod` base + in helper (a',b') xs hunk ./examples/ch05/EfficientList.hs 7 +{-- snippet mySmartExample --} +mySmartExample xs = if not (null xs) + then head xs + else 'Z' + +myOtherExample (x:_) = x +myOtherExample [] = 'Z' +{-- /snippet mySmartExample --} + hunk ./examples/ch05/Fold.hs 6 -foldl f z xs = step z xs - where step z [] = z - step z (x:xs) = step (f z x) xs +foldl step zero (x:xs) = foldl step (step zero x) xs +foldl _ zero [] = zero hunk ./examples/ch05/Fold.hs 13 -foldr f z xs = step xs - where step [] = z - step (y:ys) = f y (step ys) +foldr step zero (x:xs) = step x (foldr step zero xs) +foldr _ zero [] = zero hunk ./examples/ch05/Fold.hs 21 - where step x [] = [f x] - step x ys = f x : ys + where step x ys = f x : ys hunk ./examples/ch05/Fold.hs 55 +{- hunk ./examples/ch05/Fold.hs 57 --- step 0 (1:2:3:[]) == step (0 + 1) (2:3:[]) --- step (0 + 1) (2:3:[]) == step ((0 + 1) + 2) (3:[]) --- step ((0 + 1) + 2) [3] == step (((0 + 1) + 2) + 3) [] --- step (((0 + 1) + 2) + 3) [] == (((0 + 1) + 2) + 3) +foldl (+) 0 (1:2:3:[]) + == foldl (+) (0 + 1) (2:3:[]) + == foldl (+) ((0 + 1) + 2) (3:[]) + == foldl (+) (((0 + 1) + 2) + 3) [] + == (((0 + 1) + 2) + 3) hunk ./examples/ch05/Fold.hs 63 +-} hunk ./examples/ch05/Fold.hs 65 +{- hunk ./examples/ch05/Fold.hs 67 --- step (1:2:3:[]) == 1 + step (2:3:[]) --- 1 + step (2:3:[]) == 1 + (2 + step (3:[]) --- 1 + (2 + step [3]) == 1 + (2 + (3 + step [])) --- 1 + (2 + (3 + step [])) == 1 + (2 + (3 + 0)) +foldr (+) 0 (1:2:3:[]) + == 1 + foldr (+) 0 (2:3:[]) + == 1 + (2 + foldr (+) 0 (3:[]) + == 1 + (2 + (3 + foldr (+) 0 [])) + == 1 + (2 + (3 + 0)) hunk ./examples/ch05/Fold.hs 73 +-} hunk ./examples/ch05/Fold.hs 75 +{- hunk ./examples/ch05/Fold.hs 77 --- 1 : 2 : 3 : [] --- 1 + 2 + 3 + 0 +1 : (2 : (3 : [])) +1 + (2 + (3 + 0 )) hunk ./examples/ch05/Fold.hs 80 +-} + +{-- snippet strict --} +foldl' _ zero [] = zero +foldl' step zero (x:xs) = + let new = step zero x + in new `seq` foldl' step new xs +{-- /snippet strict --} + +{- +{-- snippet step1 --} +foldl' (+) 1 (2:[]) +{-- /snippet step1 --} + +{-- snippet step2 --} +let new = 1 + 2 +in new `seq` foldl' (+) new [] +{-- /snippet step2 --} + +{-- snippet step3 --} +foldl' (+) 3 [] +{-- /snippet step3 --} + +{-- snippet step4 --} +3 +{-- /snippet step4 --} +-} + +{-- snippet meaningless --} +tryToForce x = x `seq` x +{-- /snippet meaningless --} + +{-- snippet hiddenInside --} +-- incorrect: seq is hidden by the application of someFunc +-- since someFunc will be evaluated first, seq may occur too late +hiddenInside x y = someFunc (x `seq` y) + +-- incorrect: a variation of the above mistake +hiddenByLet x y z = let a = x `seq` someFunc y + in anotherFunc a z + +-- correct: seq will be evaluated first, forcing evaluation of x +onTheOutside x y = x `seq` someFunc y +{-- /snippet hiddenInside --} + +{-- snippet chained --} +chained x y z = x `seq` y `seq` someFunc z +{-- /snippet chained --} + +{-- snippet badExpression --} +badExpression step zero (x:xs) = + seq (step zero x) + (badExpression step (step zero x) xs) +{-- /snippet badExpression --} + +someFunc = error "" +anotherFunc = error "" + +{-- snippet strictPair --} +strictPair (a,b) = a `seq` b `seq` (a,b) + +strictList (x:xs) = x `seq` x : strictList xs +strictList [] = [] +{-- /snippet strictPair --} hunk ./examples/ch05/IntParse.hs 4 -import Data.Char (ord) +import Data.Char (digitToInt) -- we'll need ord shortly hunk ./examples/ch05/IntParse.hs 20 -loop acc (x:xs) = let acc' = acc * 10 + ord x - ord '0' +loop acc (x:xs) = let acc' = acc * 10 + digitToInt x hunk ./examples/ch05/InteractWith.hs 1 +{-- snippet main --} +-- Save this in a source file, e.g. Interact.hs + +import System.Environment (getArgs) + +interactWith function inputFile outputFile = do + input <- readFile inputFile + writeFile outputFile (function input) + +main = mainWith myFunction + where mainWith function = do + args <- getArgs + case args of + [input,output] -> interactWith function input output + _ -> putStrLn "error: exactly two arguments needed" + + -- replace "id" with the name of our function below + myFunction = id +{-- /snippet main --} hunk ./examples/ch05/Map.hs 13 -square (x:xs) = x**2 : square xs +square (x:xs) = x*x : square xs hunk ./examples/ch05/Map.hs 19 - where squareOne x = x ** 2 + where squareOne x = x * x hunk ./examples/ch05/Map.hs 28 -myMap _ _ = [] +myMap _ _ = [] hunk ./examples/ch05/Partial.hs 16 +{-- snippet isInAny4 --} +isInAny4 needle haystack = any (needle `isInfixOf`) haystack +{-- /snippet isInAny4 --} + hunk ./examples/ch05/Plus.hs 5 + deriving (Show) + +-- we can use the constructor either prefix or infix +foo = Pair 1 2 +bar = True `Pair` "quux" hunk ./examples/ch05/SplitLines.hs 1 -{-- snippet splitLines --} +{-- snippet splitLines.type --} hunk ./examples/ch05/SplitLines.hs 3 +{-- /snippet splitLines.type --} hunk ./examples/ch05/SplitLines.hs 5 +{-- snippet splitLines --} hunk ./examples/ch05/SplitLines.hs 7 -splitLines cs = case break isLineSeparator cs of - (pre, '\r':'\n':suf) -> pre : splitLines suf - (pre, '\r':suf) -> pre : splitLines suf - (pre, '\n':suf) -> pre : splitLines suf - (pre, "") -> [pre] +splitLines cs = + let (pre, suf) = break isLineTerminator cs + in pre : case suf of + ('\r':'\n':rest) -> splitLines rest + ('\r':rest) -> splitLines rest + ('\n':rest) -> splitLines rest + _ -> [] hunk ./examples/ch05/SplitLines.hs 15 -isLineSeparator :: Char -> Bool -isLineSeparator c = c `elem` "\r\n" +isLineTerminator c = c == '\r' || c == '\n' hunk ./examples/ch05/SplitLines.hs 18 +{-- snippet fixLines --} +fixLines :: String -> String +fixLines input = unlines (splitLines input) +{-- /snippet fixLines --} + hunk ./examples/ch05/SuffixTree.hs 44 -{-- snippet noisier --} -noisier :: [a] -> [[a]] -noisier (x:xs) = (x:xs) : noisier xs -noisier _ = [] -{-- /snippet noisier --} +{-- snippet noAsPattern --} +noAsPattern :: [a] -> [[a]] +noAsPattern (x:xs) = (x:xs) : noAsPattern xs +noAsPattern _ = [] +{-- /snippet noAsPattern --} hunk ./examples/ch05/Sum.hs 13 +niceSum :: [Integer] -> Integer hunk ./examples/ch05/Sum.hs 18 +nicerSum :: [Integer] -> Integer hunk ./examples/ch05/ch05.exercises.ghci 6 +asInt_fold "1798" hunk ./examples/ch05/ch05.exercises.hs 1 -import Data.Char (ord) +import Data.Char (ord, digitToInt) hunk ./examples/ch05/ch05.exercises.hs 28 -asInt_fold ('-':xs) = negate (asInt_fold' xs) -asInt_fold xs = asInt_fold' xs +asInt_fold ('-':xs) = negate (asInt_fold xs) +asInt_fold xs = foldl (\a b -> (a*10) + (digitToInt b)) 0 xs hunk ./examples/ch05/ch05.list.ghci 12 +length "strings are lists, too" hunk ./examples/ch05/ch05.list.ghci 97 -takeWhile odd [1,3,5,6,8] +takeWhile odd [1,3,5,6,8,9,11] hunk ./examples/ch05/ch05.list.ghci 99 -dropWhile even [2,4,6,7,9] +dropWhile even [2,4,6,7,9,10,12] hunk ./examples/ch05/ch05.list.ghci 103 -:type break -break even [1,3,5,6,8] hunk ./examples/ch05/ch05.list.ghci 104 -span even [2,4,6,7,9] +span even [2,4,6,7,9,10,11] +:type break +break even [1,3,5,6,8,9,10] hunk ./examples/ch05/ch05.list.ghci 143 -zip [12,72,93] "foo" +zip [12,72,93] "zippity" hunk ./examples/ch05/ch05.list.ghci 157 -words "the quick brown\n\n\nfox" +words "the \r quick \t brown\n\n\nfox" hunk ./examples/ch05/ch05.list.ghci 167 -dropWhile isSpace +:type dropWhile isSpace hunk ./examples/ch05/ch05.list.ghci 169 ---# dropWhile.isSpace.type +--# dropWhile.isSpace.use hunk ./examples/ch05/ch05.list.ghci 171 -:type dropWhile isSpace -dropWhile isSpace " \n\nfoo" +map (dropWhile isSpace) [" a","f"," e"] hunk ./examples/ch05/ch05.list.ghci 181 -let zip3foo xs = zip3 "foo" xs +let zip3foo = zip3 "foo" hunk ./examples/ch05/dlts.ghci 6 -:type drop 1 - ---# drop1.words -:type drop 1 . words -(drop 1 . words) "#define DLT_CHAOS 5" +:type tail +tail ["#define","DLT_CHAOS","5"] +:type tail . words +(tail . words) "#define DLT_CHAOS 5" hunk ./examples/ch05/dlts.ghci 12 -:type head . drop 1 . words -(head . drop 1 . words) "#define DLT_CHAOS 5" +:type head . tail . words +(head . tail . words) "#define DLT_CHAOS 5" hunk ./examples/ch05/dlts.hs 10 - | "#define DLT_" `isPrefixOf` l = (head . drop 1 . words) l : ds - | otherwise = ds + | "#define DLT_" `isPrefixOf` l = secondWord l : ds + | otherwise = ds + secondWord = head . tail . words hunk ./examples/ch05/dlts.hs 15 - hunk ./examples/ch05/infix.ghci 18 +--# isPrefixOf +:module +Data.List +"foo" `isPrefixOf` "foobar" + +--# isInfixOf +"needle" `isInfixOf` "haystack full of needle thingies" +"end" `isSuffixOf` "the end" + hunk ./examples/ch05/intparse.c 2 +#include hunk ./examples/ch05/intparse.c 7 - int acc; + int acc; /* accumulate the partial result */ hunk ./examples/ch05/intparse.c 9 - for (acc = 0; *str != '\0'; str++) { - acc = acc * 10 + *str - '0'; + for (acc = 0; isdigit(*str); str++) { + acc = acc * 10 + (*str - '0'); hunk ./examples/ch05/map.c 4 -char *uppercase(char *out, const char *in) +char *uppercase(const char *in) hunk ./examples/ch05/rms.hs 5 - where square x = x ** 2 + where square x = x * x hunk ./examples/ch05/rms.hs 13 -rootMeanSquare_foldl xs = let (length, meanSquare) = foldl step (0,0) xs - in sqrt (meanSquare / fromIntegral length) - where step (length, meanSquare) x = (length + 1, meanSquare + x**2) +rootMeanSquare_foldl xs = + let (count, sumOfSquares) = foldl step (0,0) xs + in sqrt (sumOfSquares / fromIntegral count) + where step (cnt,sumSq) x = (cnt + 1, sumSq + x*x) hunk ./examples/ch05/splitlines.ghci 28 -break isLineSeparator "foo" +break isLineTerminator "foo" hunk ./examples/ch05/splitlines.ghci 36 -break isLineSeparator "foo\r\nbar" +break isLineTerminator "foo\r\nbar" hunk ./examples/ch05/suffix.ghci 30 +:module +Data.Char +let capCount = length . filter (isUpper . head) . words +capCount "Hello there, Mom!" hunk ./examples/ch05/suffix.ghci 34 -(sum . map sum . filter (any odd)) [[1,4,9],[2,4,6]] +--# words.type +:type words + +--# capped.type +:type isUpper . head + +--# capfilt.type +:type filter (isUpper . head) + +--# compose1.type +:type length . filter (isUpper . head) + +--# foo +let foo = sum . map sum . filter (any odd) +:type foo }