[More work on ch13 John Goerzen **20070827065650] { hunk ./en/ch13-data.xml 387 - - You can built association lists just like you would build any other - list. Haskell comes with one built-in function called - Data.List.lookup to look up data in an association - list. Its type is Eq a => a -> [(a, b)] -> Maybe b. - Can you guess how it works from that type? Let's take a look in - &ghci;. - - &lookup.ghci:lookup1; - - The lookup function is really simple. Here's - one way you could write it: - - &lookup.hs:standalone; - - This function returns &Nothing; if passed the empty list. Otherwise, - it compares the key with the key we're looking for. If a match is - found, the corresponding value is returned. Otherwise, it searches - the rest of the list. - - - Let's take a look at a more complex example of association lists. - On Unix/Linux machines, - there is a file called /etc/passwd that stores - usernames, UIDs, home directories, and various other information. - Let's write a program that parses such a file, creates an association - list, and lets the user look up a username by giving a UID. - - &passwd-al.hs:all; - - Let's look at this program. The heart of it is - findByUID, which is a simple function that parses - the input one line at a time, then calls lookup over - the result. The remaining program is concerned with parsing the input. - The input file looks like this: - - + + Building Association Lists + + You can built association lists just like you would build any other + list. Haskell comes with one built-in function called + Data.List.lookup to look up data in an association + list. Its type is Eq a => a -> [(a, b)] -> Maybe b. + Can you guess how it works from that type? Let's take a look in + &ghci;. + + &lookup.ghci:lookup1; + + The lookup function is really simple. Here's + one way you could write it: + + &lookup.hs:standalone; + + This function returns &Nothing; if passed the empty list. Otherwise, + it compares the key with the key we're looking for. If a match is + found, the corresponding value is returned. Otherwise, it searches + the rest of the list. + + + + Inserting Into Association Lists + + It's easy enough to build an association list based on data that's + known up-front. But a common use for these lists may involve + modifying existing data -- replacing the data a key points to. In + Haskell, of course, we don't actually modify anything in-place; we + would return a new list instead. + + + Here's an insert function. It will insert the given key/value pair + into the association list if the key isn't already in the list. If + the key is already present, it will instead modify the associated + value. Notice how easy it is to develop data structures like this in + Haskell. + + &insert.hs:all; + + The insert function takes a key, value, and an + association list. It simply iterates over the list, looking for a + key that matches the given key. If it never finds one, it adds + the key/value pair at the end of the list. Let's see it in action + + &insert.ghci:insert1; + + You can see the effect of insert on existing + lists. You can also build up lists from scratch using + insert. + + &insert.ghci:insert2; + + In fact, you can even use insert together with + foldl to build up lists from existing data. + Though slower, it may be better than simply using + map because it will prevent duplicate keys from + occuring in your list. + + FIXME: have we described foldl already? if so, insert + ref + &insert.ghci:insert3; + + Notice how the second definition for the key 1 + overwrote the first. + + + + Association List Example + + Let's take a look at a more complex example of association lists. + On Unix/Linux machines, + there is a file called /etc/passwd that stores + usernames, UIDs, home directories, and various other information. + Let's write a program that parses such a file, creates an association + list, and lets the user look up a username by giving a UID. + + &passwd-al.hs:all; + + Let's look at this program. The heart of it is + findByUID, which is a simple function that parses + the input one line at a time, then calls lookup over + the result. The remaining program is concerned with parsing the input. + The input file looks like this: + + hunk ./en/ch13-data.xml 486 + + This is an excerpt from /etc/passwd on a live + system. The split function separates a line into + fields by looking for colons. Note the type of the function, though: + it's so generic that it can be used on lists of things other than + strings even. + + + + Assocation List Performance + + As you might imagine, if you have a large amount of data, searching + through it can be slow. lookup may have to search + through an entire list. If you have hundreds of thousands of entries, + this could take a long time. + Changing existing data can be slow, too. The + Data.Map module addresses both of these concerns, + and we'll cover it in the next section. + + + Association lists have a couple of advantages, though. One is that + they are simple to use, build, and understand. Another is that they + are standard lists and can be manipulated by standard list functions + such as sort, reverse, + head, and tail. Finally, + association lists preserve their original order, while maps may not. + + + + + + Maps + + Maps are similar to association lists in concept: they are a mapping + from a key to a value. Maps have many advantages over association + lists. For one, performance is far better for large maps than for + large lists. Maps have an internal representation tailored for this + particular purpose, and lookups and inserts don't need to perform + exhaustive searches for maps like they do for association lists. The + Data.Map module also has a richer set of functions + for working on maps than Data.List has for + association lists. + + + Like association lists, maps can have values of any type. Keys in a + map must be a member of the &Eq; and &Ord; typeclasses. In association + lists, they need only be a member of &Eq;. This is not much of a + restriction in practice; almost every type that is in &Eq; is also in + &Ord;, and just about every common type is in both. + hunk ./en/ch13-data.xml 537 - This is an excerpt from /etc/passwd on a live - system. The split function separates a line into - fields by looking for colons. Note the type of the function, though: - it's so generic that it can be used on lists of things other than - strings even. + The Data.Map module has some functions with same + names as those in Prelude or other common modules. + Therefore, when using it, most people import it using + import qualified Data.Map as Map and use + Map.function to refer to + functions in that module. + Let's start our look at Data.Map by + taking a look at some ways to build a map. hunk ./en/ch13-data.xml 546 + &buildmap.hs:all; hunk ./en/ch13-data.xml 548 - As you might imagine, if you have a large amount of data, searching - through it can be slow. lookup may have to search - through an entire list. If you have hundreds of thousands of entries, - this could take a long time. - Changing existing data can be slow, too. The - Data.Map module addresses both of these concerns, - and we'll cover it in the next section. + Functions like Map.insert work in the usual Haskell + way: they return a copy of the input data, with the requested change + applied. This is quite handy with maps. It means that you can use + foldl to build up a map as in the + mapFold example. Or, you can chain together + calls to Map.insert as in the + mapManual example. Let's use &ghci; to verify + that all of these maps are as expected: hunk ./en/ch13-data.xml 557 + &buildmap.ghci:all; hunk ./en/ch13-data.xml 559 - Association lists have a couple of advantages, though. One is that - they are simple to use, build, and understand. Another is that they - are standard lists and can be manipulated by standard list functions - such as sort, reverse, - head, and tail. Finally, - association lists preserve their original order, while maps may not. + Notice that the output from mapManual doesn't occur + in the order it was passed in. Maps do not guarantee that they will + preserve the original ordering. In fact, the present implementation + always stores mapped data sorted internally. addfile ./examples/ch13/buildmap.ghci hunk ./examples/ch13/buildmap.ghci 1 +--# all +:l buildmap.hs +al +mapFromAL +mapFold +mapManual addfile ./examples/ch13/buildmap.hs hunk ./examples/ch13/buildmap.hs 1 +{-- snippet all --} +import qualified Data.Map as Map + +-- Functions to generate a Map that represents an association list +-- as a map + +al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")] + +{- | Create a map representation of 'al' by converting the association +- list using Map.fromList -} +mapFromAL = + Map.fromList al + +{- | Create a map represetation of 'al' by doing a fold -} +mapFold = + foldl (\map (k, v) -> Map.insert k v map) Map.empty al + +{- | Manually create a map with the elements of 'al' in it -} +mapManual = + Map.insert 2 "two" . + Map.insert 4 "four" . + Map.insert 1 "one" . + Map.insert 3 "three" $ Map.empty +{-- /snippet all --} addfile ./examples/ch13/insert.ghci hunk ./examples/ch13/insert.ghci 1 +--# insert1 +let al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")] +insert 5 "five" al +insert 2 "TWO" al +insert 1 "one" [] +--# insert2 +insert 3 "three" . insert 2 "two" . insert 1 "one" $ [] +--# insert3 +foldl (\list (k, v) -> insert k v list) [] [(1, "one"), (2, "two"), (1, "ONE")] + addfile ./examples/ch13/insert.hs hunk ./examples/ch13/insert.hs 1 +{-- snippet all --} +insert :: Eq a => a -> b -> [(a, b)] -> [(a, b)] +insert key value [] = [(key, value)] +insert key value ((listkey, listvalue):remainder) + | key == listkey = (key, value) : remainder + | otherwise = (listkey, listvalue) : insert key value remainder +{-- /snippet all --} }