[More progress on chapter 4: list functions. Bryan O'Sullivan **20070814235751] { hunk ./en/Makefile 19 - $(wildcard ../examples/ch04/*.java) + $(wildcard ../examples/ch04/*.java) \ + $(wildcard ../examples/ch04/*.py) hunk ./en/Makefile 175 +vpath %.py $(addprefix ../examples/,$(dir $(src-examples))) hunk ./en/Makefile 193 + +x/.stamp-%.py: %.py ../tools/bin/snippets x + ../tools/bin/snippets $(CURDIR)/x $< > $@ hunk ./en/ch03-funcs-types.xml 203 - + hunk ./en/ch03-funcs-types.xml 982 - + hunk ./en/ch04-fp.xml 6 + + Thinking in Haskell + + There are two aspects to the early stages of programming + effectively in Haskell. The first is coming to terms with the + shift in mindset from imperative programming to functional: we + have to step away from an entire toolbox of programming habits + from other languages. We do this not because these techniques + are bad, but because in a functional language they're either not + applicable or they turn out downright ungainly. At the same + time, we have to come up to speed with a replacement vocabulary, + so that we can hack productively. + + Our second challenge is learning our way around the standard + Haskell libraries. As in any language, the libraries act as a + lever, enabling us to multiply our problem solving power. + However, Haskell libraries tend to be organised around a higher + level of abstraction than those in many other languages. We'll + probably have to put more effort into learning them, but in + exchange they offer a tantalisingly greater magnification of our + efforts. + + In this chapter, we'll introduce a number of common + functional programming techniques. We'll draw upon examples + from imperative languages to highlight the shift in thinking + that we'll need to make. As we do so, we'll walk through some + of the fundamentals of Haskell's standard libraries. + + + + Warming up: portably splitting lines of text + + Haskell provides a built-in function, + lines, that lets us split a text string on + line boundaries. It returns a list of strings with line + termination characters omitted. + + &splitlines.ghci:lines; + + While lines looks useful, it's not + portable: it doesn't deal well with Windows line ending + conventions. + + &splitlines.ghci:lines.cr; + + It only splits on newline characters, leaving carriage + returns dangling at the ends of lines. Ugh. + + Years of comfortable hacking with Python's universal + newline support, which transparently handles Unix and + Windows line ending conventions for us without us needing to + worry about them. Although Python conveniently provides a + splitlines string method, let's reimplement + it as a Python function, just to see what the code might look + like. + + &splitlines.py:splitlines; + + The key to understanding this code lies with Python's + partition string method: it searches for a + substring. If it finds it, it returns a triple of the substring + before the match, the match, and the substring after the match. + Otherwise, it returns a triple of the entire string and two + empty strings. + + Because we're still early in our career of reading Haskell + code, we'll discuss our Haskell implementation in quite some + detail. This is almost the last instance where we'll walk + through basic Haskell code at this level. + + Haskell provides a similar function named + break that we can use to the same end. + Unlike Python's partition, it operates over + lists of any type, instead of strings. It does this by taking a + function as its first parameter: this function takes an element + of the list, and returns a Bool to indicate whether + to break the list there or not. What break + returns is a two-tuple, of the sublist before the predicate + returns True (the prefix), and the rest of the list + (the suffix). + + &splitlines.ghci:break; + + This makes break both more and less + powerful than partition: we can use it on + any type of list, not just a string; but it can only examine a + single item at a time, not a sublist. + + For our purposes, examining one character at a time will work + perfectly well, since we only need to match a single carriage + return or newline. + + &SplitLines.hs:splitLines; + + Our function first calls break on its + input string, to break it into the substring before a line + terminator, and the rest of the string (which will include the + line terminator, if any). + + It then uses the pattern matching capabilities of + case to inspect the return value of + break. We know that + break always returns a two-tuple, so each + pattern on the left matches a two-tuple. The first element of + the tuple pattern doesn't inspect its value; it just binds the + variable pre to whatever value is in that + element of the tuple. The second element of the pattern + does inspect its value. The first pattern + matches any string containing at least two characters that + begins with a carriage return, followed by a newline, then binds + suf to the remainder of the string. The + other patterns should be self-explanatory. + + The right-hand side of each of the first three branches of + the case expression constructs a new list, using + the prefix string pre as the head, and as the + remainder, the list of strings resulting from applying + splitLines to the suffix string + suf without the leading line separator + characters. + + Following a prose description of the behaviour of a Haskell + function isn't easy. We can get a better understanding by + stepping into &ghci;, and watching the function at work in + different circumstances. + + We'll start by looking at a string that doesn't contain any + line separators. + + &splitlines.ghci:splitLines.foo; + + Here, our call to break never finds a + line separator, so we get an empty suffix. + + &splitlines.ghci:break.foo; + + The case expression thus hits a match on the + fourth branch, and we're done. What about a slightly more + interesting case? + + &splitlines.ghci:splitLines.foobar; + + Our first call to break gives us a + non-empty suffix. + + &splitlines.ghci:break.foobar; + + Because the suffix begins with a carriage return, followed + by a newline, we match on the first branch of the + case expression. This gives us + pre bound to "foo", and + suf bound to "bar". We call + splitLines again, this time on + "bar" alone. + + &splitlines.ghci:splitLines.bar; + + The result is that we construct a list whose head is + "foo" and whose tail is + ["bar"]. + + &splitlines.ghci:cons; + + + + Working with lists + + As the bread and butter of functional programming, lists + deserve some serious attention. The standard prelude defines + dozens of functions for dealing with lists. Many of these will + be indispensable tools, so it's important that we learn them + early on. + + The Data.List module is the real + logical home of all standard list functions. The prelude merely + re-exports a large subset of the functions exported by + Data.List. Several invaluable functions in + Data.List are not re-exported + by the standard prelude. As we walk through list functions in + the sections that follow, we'll explicitly mention those that + are in Data.List. + + &ch04.list.ghci:Data.List; + + + Basic list manipulation + + The simplest function on a list is + null, which merely tells us whether or + not the list is empty. + + &ch04.list.ghci:null; + + The length function tells us how many + elements are in a list. + + &ch04.list.ghci:length; + + To get the first element of a list, we use the + head function. + + &ch04.list.ghci:head; + + The converse, tail, returns all + but the head of a list. + + &ch04.list.ghci:tail; + + Another function, last, returns the + very last element of a list. + + &ch04.list.ghci:last; + + The converse of last is + init, which returns a list of all but the + last element of its input. + + &ch04.list.ghci:last; + + None of the above functions is well-behaved on empty + lists, so be careful if you don't know whether or not a list + is empty. What form does their misbehaviour take? + + &ch04.list.ghci:head.empty; + + + + + Safely and sanely working with unsafe functions + + When we want to use a function like + safe, where we know that it might blow up + on us if we pass in an empty list, the temptation might + initially be strong to check the length of the list before we + call safe. Let's construct a hideously + artificial example to illustrate our point. + + &EfficientList.hs:myDumbExample; + + If we're coming from a language like Perl or python, this + might seem like a perfectly natural way to write this test. + Behind the scenes, Python lists are arrays; and Perl arrays + are, well, arrays. So they necessarily know how long they + are, and calling len(foo) or $#foo+1 + is a perfectly natural thing to do. But as with many other + things, it's not a good idea to blindly transplant such an + assumption into Haskell. + + We've already seen the definition of the list algebraic + data type in , and know + that a list doesn't encode its own length. Thus, the only way + that length can operate is to walk the + entire list. + + Therefore, when we only care whether or not a list is + empty, calling length isn't a good + strategy. It can potentially do a lot more work than we want, + if the list we're working is finite. Worse, Haskell lets us + define infinitely long lists, on which an unsuspecting call to + length will never return! + + A more appropriate function to call here instead is + null, which runs in constant time. Better + yet, using null makes our code indicate + what property of the list we really care about. + + + + Partial and total functions + + Functions that only have return values defined for a + subset of valid inputs are called partial + functions (calling error doesn't qualify + as returning a value!). We call functions that return valid + results over their entire input domains + total functions. + + It's always a good idea to know whether a function you're + using is partial or total. Calling a partial function with an + input that it can't handle is probably the single biggest + source of straightforward, avoidable bugs in Haskell + programs. + + Some Haskell programmers go so far as to give partial + functions names that begin with a prefix such as + unsafe, so that they can't shoot themselves in + the foot accidentally. + + It's arguably a deficiciency of the standard prelude that + it defines quite a few unsafe partial + functions, like head, without also + providing safe total equivalents. + + + + + More simple list manipulations + + Haskell's name for the append function is + (++). + + &ch04.list.ghci:append; + + The concat function takes a list of + lists, all of the same type, and flattens them + into a single list. + + &ch04.list.ghci:concat; + + It only flattens one level of nesting. + + &ch04.list.ghci:concat.multi; + + The reverse function returns the + elements of a list in reverse order. + + &ch04.list.ghci:reverse; + + For lists of Bool, the + and and or functions + generalise their two-argument + cousins,(&&) and + (||). + + &ch04.list.ghci:and.or; + + They have more useful cousins, all + and any, which operate on lists of any + type. Each one takes a predicate as its first argument; + all returns True if that + predicate succeeds on every element of the list, while + any returns True if the + predicate succeeds on any element of the list. + + &ch04.list.ghci:all.any; + + + + + Working with sublists + + The take function, which we already + met in , returns a + sublist consisting of the first several elements from a list. + Its converse, drop, drops several + elements from the head of the list. + + &ch04.list.ghci:take.drop; + + The splitAt function combines the + functions of take and + drop, returning a two-tuple of the input + list, split at the given index. + + &ch04.list.ghci:splitAt; + + The takeWhile and + dropWhile functions take predicates: + takeWhile constructs a list as long as + the predicate returns True, while + dropWhile drops elements from the list as + long as the predicate returns True. + + &ch04.list.ghci:takeWhile.dropWhile; + + Just as splitAt tuples + up the results of take and + drop, the functions + break (which we already saw in ) and span + tuple up the results of takeWhile and + dropWhile. + + Each function takes a predicate; + break consumes its input while its + predicate fails, while span consumes + until its predicate succeeds. + + &ch04.list.ghci:break.span; + + + + + Searching lists + + In Data.List, three predicates, + isPrefixOf, + isInfixOf, and + isSuffixOf, let us test for the presence + of sublists within a bigger list. The easiest way to use them + is as infix functions, where they read quite naturally. + + The isPrefixOf function tells us + whether its left argument matches the beginning of its right + argument. + + &ch04.list.ghci:isPrefixOf; + + The isInfixOf function indicates + whether its left argument is a sublist of its right. + + &ch04.list.ghci:isInfixOf; + + The operation of isSuffixOf shouldn't + need any explanation. + + &ch04.list.ghci:isSuffixOf; + + + + Exercises + + + + + Write your own safe definitions of + the standard partial list functions that never + fail. + + &ch04.exercises.hs:safe; + + + + + + + hunk ./en/ch04-fp.xml 445 - Here's a C function that takes a string of decimal digits - and turns them into an integer. + A straightforward way to make the jump from a language + that has loops to one that doesn't is to run through a few + examples, looking at the differences. Here's a C function + that takes a string of decimal digits and turns them into an + integer. hunk ./en/ch04-fp.xml 664 - name, there's nothing mysterious about higher-order - functions; it's just a term for functions that take other - functions as arguments, or return functions.) + name, there's nothing mysterious about higher-order functions; + it's just a term for functions that take other functions as + arguments, or return functions.) hunk ./en/ch04-fp.xml 678 - outselves some assurance that it behaves similarly to the + outselves some assurance that it behaves similarly to the hunk ./en/ch04-fp.xml 1013 - with some other value, we have an easy way to think about - Haskell's list append function, + with some other value, this gives us an easy way to think + about Haskell's list append function, hunk ./en/ch04-fp.xml 1046 + As our extended treatment of folds should indicate, the + foldr function is nearly as important a + member of our list-programming toolbox as the more basic list + functions we saw in . hunk ./en/ch04-fp.xml 1075 - linkend="hs.fp.tailrecursion"/>. + linkend="hs.fp.tailrecursion"/>. hunk ./en/ch04-fp.xml 1142 + + + + + How many of the following standard prelude functions + can you rewrite using list folds? + addfile ./examples/ch04/EfficientList.hs hunk ./examples/ch04/EfficientList.hs 1 +{-- snippet myDumbExample --} +myDumbExample xs = if length xs > 0 + then head xs + else 'Z' +{-- /snippet myDumbExample --} hunk ./examples/ch04/ch04.exercises.ghci 18 +--# asFolds + +:module +Data.List +:type reverse +reverse "hcogogogoilisytnallllwbordnrywhcyregogllygnywgllwpriafnall" +:type intersperse +intersperse 3 [2,4,6] +:type and +and [True,False,False] +:type maximum +maximum [1,3,5,7,6,4] +:type nub +nub [1,1,2,2,2,2,2,3] + hunk ./examples/ch04/ch04.exercises.hs 28 -asInt_fold ('-':xs) = negate (asInt' xs) +asInt_fold ('-':xs) = negate (asInt_fold' xs) hunk ./examples/ch04/ch04.exercises.hs 61 +breakList :: ([a] -> Maybe [a]) -> [a] -> ([a], [a], [a]) +breakList p = helper [] + where helper acc xs@(x:xs') = case p xs of + Just sep -> (reverse acc, sep, + foldr (const tail) xs sep) + Nothing -> helper (x:acc) xs' + helper acc [] = (reverse acc, [], []) + +{-- snippet safe --} +safeHead :: [a] -> Maybe a +safeTail :: [a] -> Maybe [a] +safeLast :: [a] -> Maybe a +safeInit :: [a] -> Maybe [a] +{-- /snippet safe --} + +safeHead (x:_) = Just x +safeHead _ = Nothing + +safeTail (_:xs) = Just xs +safeTail _ = Nothing + +safeLast [x] = Just x +safeLast (_:xs) = safeLast xs +safeLast [] = Nothing + +safeInit [] = Nothing +safeInit [x] = Just [] +safeInit (x:xs) = maybe Nothing (Just . (x:)) (safeInit xs) + addfile ./examples/ch04/ch04.list.ghci hunk ./examples/ch04/ch04.list.ghci 1 +--# null + +:type null +null [] +null "plugh" + +--# length + +:type length +length [] +length [1,2,3] + +--# head + +:type head +head [1,2,3] + +--# tail + +:type tail +tail "foo" + +--# last + +:type last +last "bar" + +--# head.empty + +head [] + +--# append + +:type (++) +"foo" ++ "bar" +[] ++ [1,2,3] +[True] ++ [] + +--# concat + +:type concat +concat [[1,2,3], [4,5,6]] + +--# concat.multi + +concat [[[1,2],[3]], [[4],[5],[6]]] +concat (concat [[[1,2],[3]], [[4],[5],[6]]]) + +--# reverse + +:type reverse +reverse "foo" + +--# and.or + +:type and +and [True,False,True] +and [] +:type or +or [False,False,False,True,False] +or [] + +--# all.any + +:type all +all odd [1,3,5] +all odd [3,1,4,1,5,9,2,6,5] +all odd [] +:type any +any even [3,1,4,1,5,9,2,6,5] +any even [] + +--# take.drop + +:type take +take 3 "foobar" +take 2 [1] +:type drop +drop 3 "xyzzy" +drop 1 [] + +--# splitAt + +:type splitAt + +splitAt 3 "foobar" + +--# takeWhile.dropWhile + +:type takeWhile +takeWhile odd [1,3,5,6,8] +:type dropWhile +dropWhile even [2,4,6,7,9] + +--# break.span + +:type break +break even [1,3,5,6,8] +:type span +span even [2,4,6,7,9] + +--# Data.List + +:module +Data.List + +--# isPrefixOf + +:type isPrefixOf +"foo" `isPrefixOf` "foobar" +[1,2] `isPrefixOf` [] + +--# isInfixOf + +[2,6] `isInfixOf` [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9] +"funk" `isInfixOf` "sonic youth" + +--# isSuffixOf + +".c" `isSuffixOf` "crashme.c" hunk ./tools/Snip.hs 40 + "py" -> (startPy, endPy) hunk ./tools/Snip.hs 46 + startPy = B.pack "## snippet " + endPy = B.pack "## /snippet " }