[I forgot to write about maps! Eep! Bryan O'Sullivan **20071213062305] { move ./examples/ch12/map.ghci ./examples/ch12/ch12-map.ghci hunk ./en/bibliography.xml 27 + + + + Okasaki99 + 1999 + ChrisOkasaki + Purely Functional Data Structures + + Cambridge University + Press + + 0521663504 + + + + Okasaki96 + 1996 + ChrisOkasaki + <ulink + url="http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf">Purely + Functional Data Structures</ulink> + + Carnegie Mellon University + + This is Okasaki's PhD thesis, a less complete + precursor to . hunk ./en/ch12-barcode.xml 219 - + hunk ./en/ch12-barcode.xml 792 + + + + Life without arrays or hash tables + + In an imperative language, the array is as much a + bread and butter type as a list or tuple in + Haskell. We take it for granted that an array in an imperative + language is usually mutable; we can change an element of an + array whenever it suits us. + + As we mentioned in , Haskell arrays are + not mutable. This means that to + modify a single array element, a copy of the + entire array is made, with that single element set to its new + value. Clearly, this approach is not a winner for + performance. + + The mutable array is a building block for another ubiquitous + imperative data structure, the hash table. In the typical + implementation, an array acts as the spine of the + table, with each element containing a list of elements. To add + an element to a hash table, we hash the element to find the + array offset, and modify the list at that offset to add the + element to it. + + If arrays aren't mutable, to updating a hash table, we must + create a new one. We copy the array, putting a new list at the + offset indicated by the element's hash. We don't need to copy + the lists at other offsets, but we've already dealt performance + a fatal blow simply by having to copy the spine. + + At a single stroke, then, immutable arrays have eliminated + two canonical imperative data structures + from our toolbox. Arrays are barely useful in pure Haskell + code; we'll have to do quite a bit of learning before we can use + mutable arrays. + + + A forest of solutions + + This is not the calamitous situation that it might seem, + though. Arrays and hash tables are often used as collections + indexed by a key, and in Haskell we use + trees for this purpose. + + Implementing a naive tree type is particularly easy in + Haskell. Beyond that, more useful tree types are also + unusually easy to implement. Self-balancing structures, such + as red-black trees, have struck fear into generations of + undergraduate computer science students, because the balancing + algorithms are notoriously hard to get right. + + Haskell's combination of algebraic data types, pattern + matching, and guards reduce even the hairiest of balancing + operations to a few lines of code. We'll bite back our + enthusiasm for building trees, however, and focus on why + they're particularly useful in a pure functional + language. + + The attraction of a tree to a functional programmer is + cheap modification. We don't break the + immutability rule: trees are immutable just like everything + else. However, when we modify a tree, creating a new tree, we + can share most of the structure of the tree between the old + and new versions. For example, in a tree containing 10,000 + nodes, we might expect that the old and new versions will + share about 9,985 elements when we add or remove one. In + other words, the number of elements modified per update + depends on the height of the tree, or the logarithm of the + size of the tree. + + Haskell's standard libraries provide two collection types + that are implemented using balanced trees behind the scenes: + Data.Map for key/value pairs, and + Data.Set for sets of values. As we'll be using + Data.Map in the sections that follow, we'll give + a quick introduction to it below. Data.Set is + sufficiently similar that you should be able to pick it up + quickly. + + + A word about performance + + Compared to a hash table, a well-implemented purely + functional tree data structure can perform competitively. + You should not approach trees with the assumption that your + code will pay a performance penalty. + + + + + A brief introduction to maps + + The Data.Map module provides a parameterised + type, Map k a, that maps from a key type k to a value type a. It is written as a size-balanced + binary tree, but the implementation is completely + opaque. + + Map is strict in its keys, but lazy in its + values. In other words, the spine, or + structure, of the map is always kept up to date, but values in + the map aren't evaluated unless you force them to be. + + It is very important to remember this, as + Map's laziness over values is a frequent source + of space leaks among coders who are not expecting it. + + Because the Data.Map module contains a number + of names that clash with Prelude names, it's usually imported + in qualified form. In this chapter, we'll import it using the + prefix M. + + &Barcode.hs:import.Map; + + + Type constraints + + The Map type doesn't place any explicit + constraints on its key type, but most of the module's useful + functions require that keys be instances of + Ord. This is noteworthy, as it's an example of + a common design pattern in Haskell code: type constraints + are pushed out to where they're actually needed, not + necessarily applied at the point where they'd result in the + least fingertyping for a library's author. + + Neither the Map type nor any functions in + the module constrain the types that can be used as + values. + + + + Partial application awkwardness + + For some reason, the type signatures of the functions in + Data.Map are not generally friendly to partial + application. The map parameter always comes last, whereas it + would be easier to partially apply if it were first. As a + result, code that uses partially applied map functions + almost always contains adapter functions to fiddle with + argument ordering. + + + + Getting started with the API + + The Data.Map module has a large + surface area: it exports dozens of functions. + Just a handful of these comprise the most frequently used + core of the module. + + To create an empty map, or one that contains one + key/value pair, we use empty or + singleton, respectively. + + &ch12-map.ghci:create; + + Since the implementation is abstract, we can't pattern + match on Map values. Instead, there are two + common lookup functions. The lookup + function has a slightly tricky type signature, but don't + worry; all will become clear shortly, in . + + &ch12-map.ghci:lookup.type; + + Most often, the type parameter m in the result is + Maybe. In other words, if the map contains a + value for the given key, lookup will + return the value wrapped in Just. Otherwise, + it will return Nothing. + + &ch12-map.ghci:lookup; + + The findWithDefault function takes + a value to return if the key isn't in the map. + + + Beware the partial functions! + + There exists a (!) operator that + performs a lookup and returns the unadorned value + associated with a key (i.e. not wrapped in + Maybe or whatever). Unfortunately, it is not + a total function: It calls error if + the key is not present in the map. + + We mention this function only to urge you to avoid any + temptation to use it. Unless, that is, you like creating + subtly buggy code. + + + To add a key/value pair to the map, the most useful + functions are insert and + insertWith'. The insert + function simply inserts a value into the map, overwriting + any matching value that may already have been + present. + + &ch12-map.ghci:insert; + + The insertWith' function takes a + further combining function as its + argument. If no key was present in the map, the new value + is inserted verbatim. Otherwise, the combining function is + called on the new and old values, and its result is inserted + into the map. + + &ch12-map.ghci:insertWith; + + As the tick at the end of its name suggests, + insertWith' evaluates the combining + function strictly. This allows you to avoid space leaks. + While there exists a lazy variant + (insertWith without the trailing tick + in the name), it's rarely what you actually want. + + The delete function deletes the + given key from the map. It returns the map unmodified if + the key was not present. + + &ch12-map.ghci:delete; + + Finally, there are several efficient functions for + performing set-like operations on maps. Of these, we'll be + using union below. This function is + left biased: if two maps contain the same + key, the result will contain the value from the left + map. + + &ch12-map.ghci:union; + + We have barely covered ten per cent of the + Data.Map API. For further inspiration, we + encourage you to browse the module documentation. It's an + impressively complete module. + + + + + Further reading + + The book gives a wonderful and + thorough implementor's tour of many pure functional data + structures, including several kinds of balanced tree. It also + provides valuable insight into reasoning about the performance + of purely functional data structures and lazy + evaluation. + + We recommend Okasaki's book as essential reading for + functional programmers. If you're not convinced, Okasaki's + PhD thesis, , is a less complete + and polished version of the book, and it is available for free + online. + hunk ./en/ch14-monads.xml 4 - FIXME + Monads hunk ./examples/ch12/Barcode.hs 15 +{-- snippet import.Map --} hunk ./examples/ch12/Barcode.hs 17 +{-- /snippet import.Map --} }