Chapter 9. Data Structures

Chapter 9. Data Structures
Prev		Next

Table of Contents

Basic Type Creation
Multiple Type Constructors
Defining Records
Records with Named Fields
Polymorphic Types
A Final Word on Types

FIXME: Mutable storage with MVars needs to go in threading area

Back in FIXME: add ref to chapter 3ish, you saw how to use type to create handy aliases for types. That's a useful feature, but in this chapter we'll take it a step further. We'll show you how to create entirely new types. After doing that, we'll also show you some of the built-in tools that Haskell provides for arranging large amounts of data.

Creating New Types

To create a new type, we use the data keyword. You can create an amazing variety of types using data. We'll take a look at them, starting with the most simple, and moving on to more complex types.

Basic Type Creation

To create a new type, we use the data keyword. In its most simple, though probably useless^[10], form, you can create a type like this:

data Silly = Foodata Silly = Foo

This defines a new type called Silly. There is one type constructor for Silly: Foo. When you write Foo in your program, this Foo is a value of type Silly. You can actually use the same word for both, but it must start with an uppercase letter.

Looking at it with ghci, there's not much you can do with it yet:

ghci> :l data1.hs
[1 of 1] Compiling Main             ( data1.hs, interpreted )
Ok, modules loaded: Main.
ghci> :t Foo
Foo :: Silly
ghci> Foo

<interactive>:1:0:
    No instance for (Show Silly)
      arising from use of `print' at <interactive>:1:0-2
    Possible fix: add an instance declaration for (Show Silly)
    In the expression: print it
    In a 'do' expression: print it
ghci> Foo == Foo

<interactive>:1:0:
    No instance for (Eq Silly)
      arising from use of `==' at <interactive>:1:0-9
    Possible fix: add an instance declaration for (Eq Silly)
    In the expression: Foo == Foo
    In the definition of `it': it = Foo == Foo
ghci> :l data1.hs
[1 of 1] Compiling Main             ( data1.hs, interpreted )
Ok, modules loaded: Main.
ghci> :t Foo
Foo :: Silly
ghci> Foo

<interactive>:1:0:
    No instance for (Show Silly)
      arising from use of `print' at <interactive>:1:0-2
    Possible fix: add an instance declaration for (Show Silly)
    In the expression: print it
    In a 'do' expression: print it
ghci> Foo == Foo

<interactive>:1:0:
    No instance for (Eq Silly)
      arising from use of `==' at <interactive>:1:0-9
    Possible fix: add an instance declaration for (Eq Silly)
    In the expression: Foo == Foo
    In the definition of `it': it = Foo == Foo

Note that ghci doesn't know how to display Foo to the screen, or how to compare it to itself. That's because we haven't made our new type a member of the Show and Eq typeclasses. We'll make a new example that is a member of these classes, and then we'll get more information out of ghci. For more on typeclasses, refer to Chapter 6, Using Typeclasses.

data Silly2 = Foo2
     deriving (Eq, Read, Show)

toSilly2 :: a -> Silly2
toSilly2 _ = Foo2data Silly2 = Foo2
     deriving (Eq, Read, Show)

toSilly2 :: a -> Silly2
toSilly2 _ = Foo2

We've also defined a function that takes any parameter and returns a value of type Silly2. Let's play with this in ghci.

ghci> :t Foo2
Foo2 :: Silly2
ghci> Foo2
Foo2
ghci> Foo2 == Foo2
True
ghci> Foo2 == Foo

<interactive>:1:8:
    Couldn't match expected type `Silly2' against inferred type `Silly'
      Expected type: Silly2
      Inferred type: Silly
    In the second argument of `(==)', namely `Foo'
    In the expression: Foo2 == Foo
ghci> :t toSilly2
toSilly2 :: a -> Silly2
ghci> :t toSilly2 5
toSilly2 5 :: Silly2
ghci> toSilly2 5
Foo2
ghci> :t Foo2
Foo2 :: Silly2
ghci> Foo2
Foo2
ghci> Foo2 == Foo2
True
ghci> Foo2 == Foo

<interactive>:1:8:
    Couldn't match expected type `Silly2' against inferred type `Silly'
      Expected type: Silly2
      Inferred type: Silly
    In the second argument of `(==)', namely `Foo'
    In the expression: Foo2 == Foo
ghci> :t toSilly2
toSilly2 :: a -> Silly2
ghci> :t toSilly2 5
toSilly2 5 :: Silly2
ghci> toSilly2 5
Foo2

You can see here how the types interact. Since Foo is of type Silly and Foo2 is of type Silly2, you can't compare them directly.

Let's now expand on this foundation with some more things that can be done with data.

Multiple Type Constructors

One way to make data more useful is to have multiple type constructors. Here's an example:

data Color = Red | Blue | Green
    deriving (Eq, Read, Show)data Color = Red | Blue | Green
    deriving (Eq, Read, Show)

This defined one new type named Color. You can create a value of type Color by using any of the three literals Red, Green, or Blue. There are several reasons this might be useful as opposed to using something like a String to store simple color names:

You are guaranteed that a Color always represents one of these three values
When you do pattern matching on a Color, the compiler will warn you if you don't consider all three possible values
It is possible to hide the implemention of Color from users when exporting symbols from the module; you can include Color in the list but not the three constructors FIXME: need to go into detail about exporting types somewhere and add a ref to it from here

Let's look at some example code that uses the Color:

describeColor :: Color -> String
describeColor Red = "Rolled out on carpet"
describeColor Blue = "Oceans"
describeColor Green = "Grass"describeColor :: Color -> String
describeColor Red = "Rolled out on carpet"
describeColor Blue = "Oceans"
describeColor Green = "Grass"

That's pretty simple: a function that takes a Color and converts it to a String. Note that we don't have to deal with the case where the input to the function is something other than our three colors, because that can't possibly happen.

You use a type defined in this way all the time. Haskell 98 defines this:

  data Bool =  False | True deriving 
                             (Read, Show, Eq, Ord, Enum, Bounded)
      
  data Bool =  False | True deriving 
                             (Read, Show, Eq, Ord, Enum, Bounded)

As you'll see in the rest of this chapter, many of the core Haskell features you rely upon are actually defined in the prelude using data. That is, they're not built into the compiler in any special way. The only thing special about them is that they are loaded for you by default since they're in the prelude.

FIXME: have we discussed prelude?

Defining Records

Custom types need not be defined solely in terms of static data. They can also take parameters. Let's say that we wanted to take our color example a new direction and let a user represent arbitrary colors in the RGB (red, green, blue) colorspace. We could use an (Int, Int, Int) tuple for this, but to illustrate records, let's define a new type.

data CustomColor = CustomColor Int Int Int
        deriving (Eq, Show, Read)data CustomColor = CustomColor Int Int Int
        deriving (Eq, Show, Read)

This defines a new type CustomColor and one type constructor also named CustomColor. This type constructor, however, requires three parameters, all Ints.

The type constructor here is a unique beast in Haskell. You can use it as a function that takes three parameters, and it will return to you one value ot type CustomColor. You can also use it to perform pattern matching in function or case definitions.

Let's use ghci to inspect this for a bit:

You can see that CustomColor 100 0 50 returns a single value of type CustomColor. Now you're probably wondering how to extract the data from that CustomColor. We use pattern matching to do that. Here's how:

color2string :: CustomColor -> String
color2string (CustomColor red green blue) =
    "red: " ++ show red ++ ", green: " ++ show green ++ ", blue: "
    ++ show bluecolor2string :: CustomColor -> String
color2string (CustomColor red green blue) =
    "red: " ++ show red ++ ", green: " ++ show green ++ ", blue: "
    ++ show blue

We used pattern matching to match the CustomColor. The three Ints were assigned to red, green, and blue, and then printed out. You can see that this worked by using ghci:

ghci> color2string (CustomColor 100 0 50)
"red: 100, green: 0, blue: 50"
ghci> color2string (CustomColor 100 0 50)
"red: 100, green: 0, blue: 50"

This sort of record is used in the standard Haskell library in the System.Time.ClockTime type:

  data ClockTime = TOD Integer Integer
       deriving (Eq, Ord)
      
  data ClockTime = TOD Integer Integer
       deriving (Eq, Ord)

A ClockTime consists of two Integers. The first is the number of whole seconds since midnight UTC on January 1, 1970. The second is an additional number of picoseconds. Since an Integer can be negative and is unbounded, a ClockTime can effectively represent any moment in history down to the picosecond.

This is useful as it is, but when you have half a dozen or more values to store in your record, it can get annoying to have to match them all. There's where named fields come in, which we'll discuss next.

Records with Named Fields

Our earlier example of a type that holds a color looked easy enough. But you have to remember the ordering of the fields that are part of the type. Also, you always have to pattern match all fields, even if you're interested in only one. Of course, you could write a function such as getRed to do that, but there's an easier way.

In Haskell, records can have named fields. When you name the fields in a record, you can still access it just as you would without the named fields. But you gain two things: automatic functions for picking out specific fields, plus an easier way to create and update these objects.

Let's take a look at a re-designed CustomColor type that uses named fields:

data CustomColor = 
  CustomColor {red :: Int,
               green :: Int,
               blue :: Int}
  deriving (Eq, Show, Read)data CustomColor = 
  CustomColor {red :: Int,
               green :: Int,
               blue :: Int}
  deriving (Eq, Show, Read)

This record stores exactly the same amount of information as our ealier CustomColor. But now we can take advantage of named fields. Let's see how that works with ghci:

ghci> :l colornamed.hs
[1 of 1] Compiling Main             ( colornamed.hs, interpreted )
Ok, modules loaded: Main.
ghci> :t CustomColor
CustomColor :: Int -> Int -> Int -> CustomColor
ghci> :l colornamed.hs
[1 of 1] Compiling Main             ( colornamed.hs, interpreted )
Ok, modules loaded: Main.
ghci> :t CustomColor
CustomColor :: Int -> Int -> Int -> CustomColor

ghci> CustomColor 100 0 50
CustomColor {red = 100, green = 0, blue = 50}
ghci> CustomColor {red = 100, green = 0, blue = 50}
CustomColor {red = 100, green = 0, blue = 50}
ghci> CustomColor {blue = 50, green = 0, red = 100}
CustomColor {red = 100, green = 0, blue = 50}
ghci> CustomColor 100 0 50
CustomColor {red = 100, green = 0, blue = 50}
ghci> CustomColor {red = 100, green = 0, blue = 50}
CustomColor {red = 100, green = 0, blue = 50}
ghci> CustomColor {blue = 50, green = 0, red = 100}
CustomColor {red = 100, green = 0, blue = 50}

First, we inspected the type of the CustomColor constructor. Note that it's type is exactly the same as the constructor for the type that didn't use named fields. Then, we created a object with identical data three different ways. The first way didn't make use of the named fields. The second and third ways did. Notice that when you use named fields, you don't have to specify the values in order.

Haskell automatically creates accessor functions for each of the named fields. Let's look at how we extract the red component of our color:

ghci> :t red
red :: CustomColor -> Int
ghci> red (CustomColor 100 0 5)
100
ghci> :t red
red :: CustomColor -> Int
ghci> red (CustomColor 100 0 5)
100

We can use that to write a modified color2string function that accesses each named field without having to pattern match each individual field. For reference, the original color2string is included here as color2string2.

color2string :: CustomColor -> String
color2string cc =
    "red: " ++ show (red cc) ++ ", green: " ++ show (green cc) 
    ++ ", blue: " ++ show (blue cc)color2string :: CustomColor -> String
color2string cc =
    "red: " ++ show (red cc) ++ ", green: " ++ show (green cc) 
    ++ ", blue: " ++ show (blue cc)

color2string2 :: CustomColor -> String
color2string2 (CustomColor red green blue) =
    "red: " ++ show red ++ ", green: " ++ show green ++ ", blue: "
    ++ show bluecolor2string2 :: CustomColor -> String
color2string2 (CustomColor red green blue) =
    "red: " ++ show red ++ ", green: " ++ show green ++ ", blue: "
    ++ show blue

In the first function, we took a CustomColor and assigned the entire thing to cc. In the second function, we used pattern matching to pick it apart up front.

Named fields also make it easy to modify one or more components of a type. Here's an example:

ghci> (CustomColor 100 0 5) {green = 200}
CustomColor {red = 100, green = 200, blue = 5}
ghci> (CustomColor 100 0 5) {red = 50, green = 200}
CustomColor {red = 50, green = 200, blue = 5}
ghci> (CustomColor 100 0 5) {green = 200}
CustomColor {red = 100, green = 200, blue = 5}
ghci> (CustomColor 100 0 5) {red = 50, green = 200}
CustomColor {red = 50, green = 200, blue = 5}

With just three fields, the burden of pattern matching isn't all that great. But what if you had a dozen or more? Named fields really help out in that case. The standard Haskell library uses named fields in System.Time.CalendarTime. Here's a excerpt from its definition:

data CalendarTime = CalendarTime {
  ctYear :: Int,
  ctMonth :: Month,
  ctDay :: Int,
  ctHour :: Int,
  ctMin :: Int,
  ctSec :: Int,
  ctPicosec :: Integer,
  ctWDay :: Day,
  ctYDay :: Int,
  ctTZName :: String,
  ctTZ :: Int,
  ctIsDST :: Bool
}
      
data CalendarTime = CalendarTime {
  ctYear :: Int,
  ctMonth :: Month,
  ctDay :: Int,
  ctHour :: Int,
  ctMin :: Int,
  ctSec :: Int,
  ctPicosec :: Integer,
  ctWDay :: Day,
  ctYDay :: Int,
  ctTZName :: String,
  ctTZ :: Int,
  ctIsDST :: Bool
}

Named fields are a tremendous time saver here. If you have a CalendarTime and want to extract just the year, you can simply say ctYear ct rather than having to match against 12 different fields in order.

Polymorphic Types

Sometimes you don't know in advance what the types of the data you want to store will be. For instance, our CustomColor type used Ints. But one could certainly also represent color values as a Float or an Integer. Haskell makes it possible to write generic code that could work with any of those. Why not make the type generic as well?

Haskell has a well-known type that is defined just this way: Maybe. It's defined in the prelude like this:

data  Maybe a  =  Nothing | Just a
  deriving (Eq, Ord)
      
data  Maybe a  =  Nothing | Just a
  deriving (Eq, Ord)

This type is often used when the result of a function could be an error or some other invalid result (Nothing), or else a real value (Just). For instance, if you are searching in a list for the first result that matches search criteria, you will either find something or you won't. If you do, you could get back the data wrapped in Just, and if you don't, you could get back Nothing.

Maybe is a polymorphic type because its type depends on the type of data that is encapsulated within it. You can, in fact, encapsulate any type of data within it. Let's look at this with ghci.

ghci> Nothing
Nothing
ghci> :t Nothing
Nothing :: Maybe a
ghci> Just "hi"
Just "hi"
ghci> :t Just "hi"
Just "hi" :: Maybe [Char]
ghci> Nothing
Nothing
ghci> :t (Nothing :: Maybe [Char])
(Nothing :: Maybe [Char]) :: Maybe [Char]
ghci> Nothing
Nothing
ghci> :t Nothing
Nothing :: Maybe a
ghci> Just "hi"
Just "hi"
ghci> :t Just "hi"
Just "hi" :: Maybe [Char]
ghci> Nothing
Nothing
ghci> :t (Nothing :: Maybe [Char])
(Nothing :: Maybe [Char]) :: Maybe [Char]

Notice here that the type of Nothing is Maybe a, but the type of Just "hi" is Maybe [Char]. The reason for this is Haskell's type inference. Nothing doesn't actually encapsulate any data, so it -- by itself -- is valid with any Just. You can also give it an explicit type to force it to behave a certain way.

Let's push the type system and see what it lets us do with Maybe.

ghci> Nothing == Nothing
True
ghci> Nothing == Just "hi"
False
ghci> Nothing == Just 123
False
ghci> Just "hi" == Just "bye"
False
ghci> Just "hi" == Just 123

<interactive>:1:18:
    No instance for (Num [Char])
      arising from the literal `123' at <interactive>:1:18-20
    Possible fix: add an instance declaration for (Num [Char])
    In the first argument of `Just', namely `123'
    In the second argument of `(==)', namely `Just 123'
    In the expression: (Just "hi") == (Just 123)
ghci> Nothing == Nothing
True
ghci> Nothing == Just "hi"
False
ghci> Nothing == Just 123
False
ghci> Just "hi" == Just "bye"
False
ghci> Just "hi" == Just 123

<interactive>:1:18:
    No instance for (Num [Char])
      arising from the literal `123' at <interactive>:1:18-20
    Possible fix: add an instance declaration for (Num [Char])
    In the first argument of `Just', namely `123'
    In the second argument of `(==)', namely `Just 123'
    In the expression: (Just "hi") == (Just 123)

You can compare Nothing to a Just easily enough; that comparison will of course always be false. The comparison between two Strings wrapped in Just also works. But the type checker wisely gives an error on the last attempt, because you're trying to compare a String to a number.

You'll see Maybe used quite a bit throughout this book, and in Haskell code in general.

A Final Word on Types

We've shown you a lot of different ways to create your own types. Note that, by using multiple constructors, you can effectively combine various approaches in a single type. For instance, you could say data Foo a = Bar | Baz Int | Other a if you so desire. This may be useful in certain situations.

Association Lists

Often times, we have to deal with data that is unordered but is indexed by a key. For instance, a Unix administrator might have a list of numeric UIDs and the textual usernames that they correspond to. The utility of this list is being able to look up a textual username for a given UID, not the order of the data. In otherwords, the UID is a key into a database.

In Haskell, there are several ways to handle data that is structured in this way. The two most common are association lists and the Data.Map module. Association lists are handy because they are simple. They are standard Haskell lists, so all the functions that work on those lists work on association lists. However, for large data sets, Data.Map will have a considerable performance advantage over association lists. We'll consider both in this chapter.

An association list is just a normal list containing (key, value) tuples. The type of a list of mappings from UID to username might be [(Integer, String)]. You could use just about any type for both the key and the value.

You can built association lists just like you would build any other list. Haskell comes with one built-in function called Data.List.lookup to look up data in an association list. Its type is Eq a => a -> [(a, b)] -> Maybe b. Can you guess how it works from that type? Let's take a look in ghci.

ghci> let al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")]
ghci> lookup 1 al
Just "one"
ghci> lookup 5 al
Nothing
ghci> let al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")]
ghci> lookup 1 al
Just "one"
ghci> lookup 5 al
Nothing

The lookup function is really simple. Here's one way you could write it:

myLookup :: Eq a => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((thiskey,thisval):rest) =
    if key == thiskey
       then Just thisval
       else myLookup1 key restmyLookup :: Eq a => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((thiskey,thisval):rest) =
    if key == thiskey
       then Just thisval
       else myLookup1 key rest

This function returns Nothing if passed the empty list. Otherwise, it compares the key with the key we're looking for. If a match is found, the corresponding value is returned. Otherwise, it searches the rest of the list.

Let's take a look at a more complex example of association lists. On Unix/Linux machines, there is a file called /etc/passwd that stores usernames, UIDs, home directories, and various other information. Let's write a program that parses such a file, creates an association list, and lets the user look up a username by giving a UID.

import Data.List
import System.IO
import Control.Monad(when)
import System.Exit
import System.Environment(getArgs)

main = do
    -- Load the command-line arguments
    args <- getArgs

    -- If we don't have the right amount of args, give an error and abort
    when (length args /= 2) $ do
        putStrLn "Syntax: passwd-al filename uid"
        exitFailure

    -- Read the file lazily
    content <- readFile (args !! 0)

    -- Compute the username in pure code
    let username = findByUID content (read (args !! 1))

    -- Display the result
    case username of 
         Just x -> putStrLn x
         Nothing -> putStrLn "Could not find that UID"

-- Given the entire input and a UID, see if we can find a username.
findByUID :: String -> Integer -> Maybe String
findByUID content uid =
    let al = map parseline . lines $ content
        in lookup uid al

-- Convert a colon-separated line into fields
parseline :: String -> (Integer, String)
parseline input =
    let fields = split ':' input
        in (read (fields !! 2), fields !! 0)

{- | Takes a delimiter and a list.  Break up the last based on the
-  delimeter. -}
split :: Eq a => a -> [a] -> [[a]]

-- If the input is empty, the result is a list of empty lists.
split _ [] = [[]]
split delim str =
    let -- Find the part of the list before delim and put it in "before".
        -- The rest of the list, including the leading delim, goes
        -- in "remainder".
        (before, remainder) = span (/= delim) str
        in
        before : case remainder of
                      [] -> []
                      x -> -- If there is more data to process,
                           -- call split recursively to process it
                           split delim (tail x)import Data.List
import System.IO
import Control.Monad(when)
import System.Exit
import System.Environment(getArgs)

main = do
    -- Load the command-line arguments
    args <- getArgs

    -- If we don't have the right amount of args, give an error and abort
    when (length args /= 2) $ do
        putStrLn "Syntax: passwd-al filename uid"
        exitFailure

    -- Read the file lazily
    content <- readFile (args !! 0)

    -- Compute the username in pure code
    let username = findByUID content (read (args !! 1))

    -- Display the result
    case username of 
         Just x -> putStrLn x
         Nothing -> putStrLn "Could not find that UID"

-- Given the entire input and a UID, see if we can find a username.
findByUID :: String -> Integer -> Maybe String
findByUID content uid =
    let al = map parseline . lines $ content
        in lookup uid al

-- Convert a colon-separated line into fields
parseline :: String -> (Integer, String)
parseline input =
    let fields = split ':' input
        in (read (fields !! 2), fields !! 0)

{- | Takes a delimiter and a list.  Break up the last based on the
-  delimeter. -}
split :: Eq a => a -> [a] -> [[a]]

-- If the input is empty, the result is a list of empty lists.
split _ [] = [[]]
split delim str =
    let -- Find the part of the list before delim and put it in "before".
        -- The rest of the list, including the leading delim, goes
        -- in "remainder".
        (before, remainder) = span (/= delim) str
        in
        before : case remainder of
                      [] -> []
                      x -> -- If there is more data to process,
                           -- call split recursively to process it
                           split delim (tail x)

Let's look at this program. The heart of it is findByUID, which is a simple function that parses the input one line at a time, then calls lookup over the result. The remaining program is concerned with parsing the input. The input file looks like this:

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
jgoerzen:x:1000:1000:John Goerzen,,,:/home/jgoerzen:/bin/bash
    
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
jgoerzen:x:1000:1000:John Goerzen,,,:/home/jgoerzen:/bin/bash

The Data.Map module has some functions with same names as those in Prelude or other common modules. Therefore, when using it, most people import it using import qualified Data.Map as Map and use Map.function to refer to functions in that module. Let's start our look at Data.Map by taking a look at some ways to build a map.

Functions like Map.insert work in the usual Haskell way: they return a copy of the input data, with the requested change applied. This is quite handy with maps. It means that you can use foldl to build up a map as in the mapFold example. Or, you can chain together calls to Map.insert as in the mapManual example. Let's use ghci to verify that all of these maps are as expected:

Notice that the output from mapManual doesn't occur in the order it was passed in. Maps do not guarantee that they will preserve the original ordering.

Maps operate similar in concept to association lists. The Data.Map module provides functions for adding and removing data from maps. It also provides functions for converting maps back and forth to association lists, filtering them, modifying them, and folding them. The library documentation for this module is good, so instead of going into detail on each function, we're going to present an example that ties together much of the concepts we've discussed in this chapter.

Extended Example: /etc/passwd

In order to illustrate the usage of a number of different data structures together, we've prepared an extended example. This example parses and stores entries from files in the format of a typical /etc/passwd file.

import Data.List
import qualified Data.Map as Map
import System.IO
import Text.Printf(printf)
import System.Environment(getArgs)
import System.Exit
import Control.Monad(when)

{- | The primary piece of data this program will store.
   It represents the fields in a POSIX /etc/passwd file -}
data PasswdEntry = PasswdEntry {
    userName :: String,
    password :: String,
    uid :: Integer,
    gid :: Integer,
    gecos :: String,
    homeDir :: String,
    shell :: String}
    deriving (Eq, Ord)

{- | Define how we get data to a 'PasswdEntry'. -}
instance Show PasswdEntry where
    show pe = printf "%s:%s:%d:%d:%s:%s:%s" 
                (userName pe) (password pe) (uid pe) (gid pe)
                (gecos pe) (homeDir pe) (shell pe)

{- | Converting data back out of a 'PasswdEntry'. -}
instance Read PasswdEntry where
    readsPrec _ value =
        case split ':' value of
             [f1, f2, f3, f4, f5, f6, f7] ->
                 -- Generate a 'PasswdEntry' the shorthand way:
                 -- using the positional fields.  We use 'read' to convert
                 -- the numeric fields to Integers.
                 [(PasswdEntry f1 f2 (read f3) (read f4) f5 f6 f7, [])]
             x -> error $ "Invalid number of fields in input: " ++ show x
        where 
        {- | Takes a delimiter and a list.  Break up the last based on the
        -  delimeter. -}
        split :: Eq a => a -> [a] -> [[a]]

        -- If the input is empty, the result is a list of empty lists.
        split _ [] = [[]]
        split delim str =
            let -- Find the part of the list before delim and put it in
                -- "before".  The rest of the list, including the leading 
                -- delim, goes in "remainder".
                (before, remainder) = span (/= delim) str
                in
                before : case remainder of
                              [] -> []
                              x -> -- If there is more data to process,
                                   -- call split recursively to process it
                                   split delim (tail x)

-- Convenience aliases; we'll have two maps: one from UID to entries
-- and the other from username to entries
type UIDMap = Map.Map Integer PasswdEntry
type UserMap = Map.Map String PasswdEntry

{- | Converts input data to maps.  Returns UID and User maps. -}
inputToMaps :: String -> (UIDMap, UserMap)
inputToMaps inp =
    (uidmap, usermap)
    where
    uidmap = Map.fromList . map (\pe -> (uid pe, pe)) $ entries
    usermap = Map.fromList . 
              map (\pe -> (userName pe, pe)) $ entries
    -- Convert the input String to [PasswdEntry]
    entries = map read (lines inp)

main = do
    -- Load the command-line arguments
    args <- getArgs

    -- If we don't have the right number of args,
    -- give an error and abort

    when (length args /= 1) $ do
        putStrLn "Syntax: passwdmap filename"
        exitFailure

    -- Read the file lazily
    content <- readFile (head args)
    let maps = inputToMaps content
    mainMenu maps

mainMenu maps@(uidmap, usermap) = do
    putStr optionText
    sel <- getLine
    -- See what they want to do.  For every option except 4,
    -- return them to the main menu afterwards by calling
    -- mainMenu recursively
    case sel of
         "1" -> lookupUserName >> mainMenu maps
         "2" -> lookupUID >> mainMenu maps
         "3" -> displayFile >> mainMenu maps
         "4" -> return ()
         _ -> putStrLn "Invalid selection" >> mainMenu maps

    where 
    lookupUserName = do
        putStrLn "Username: "
        username <- getLine
        case Map.lookup username usermap of
             Nothing -> putStrLn "Not found."
             Just x -> print x
    lookupUID = do
        putStrLn "UID: "
        uidstring <- getLine
        case Map.lookup (read uidstring) uidmap of
             Nothing -> putStrLn "Not found."
             Just x -> print x
    displayFile = 
        putStr . unlines . map (show . snd) . Map.toList $ uidmap
    optionText = 
          "\npasswdmap options:\n\
           \\n\
           \1   Look up a user name\n\
           \2   Look up a UID\n\
           \3   Display entire file\n\
           \4   Quit\n\n\
           \Your selection: "import Data.List
import qualified Data.Map as Map
import System.IO
import Text.Printf(printf)
import System.Environment(getArgs)
import System.Exit
import Control.Monad(when)

{- | The primary piece of data this program will store.
   It represents the fields in a POSIX /etc/passwd file -}
data PasswdEntry = PasswdEntry {
    userName :: String,
    password :: String,
    uid :: Integer,
    gid :: Integer,
    gecos :: String,
    homeDir :: String,
    shell :: String}
    deriving (Eq, Ord)

{- | Define how we get data to a 'PasswdEntry'. -}
instance Show PasswdEntry where
    show pe = printf "%s:%s:%d:%d:%s:%s:%s" 
                (userName pe) (password pe) (uid pe) (gid pe)
                (gecos pe) (homeDir pe) (shell pe)

{- | Converting data back out of a 'PasswdEntry'. -}
instance Read PasswdEntry where
    readsPrec _ value =
        case split ':' value of
             [f1, f2, f3, f4, f5, f6, f7] ->
                 -- Generate a 'PasswdEntry' the shorthand way:
                 -- using the positional fields.  We use 'read' to convert
                 -- the numeric fields to Integers.
                 [(PasswdEntry f1 f2 (read f3) (read f4) f5 f6 f7, [])]
             x -> error $ "Invalid number of fields in input: " ++ show x
        where 
        {- | Takes a delimiter and a list.  Break up the last based on the
        -  delimeter. -}
        split :: Eq a => a -> [a] -> [[a]]

        -- If the input is empty, the result is a list of empty lists.
        split _ [] = [[]]
        split delim str =
            let -- Find the part of the list before delim and put it in
                -- "before".  The rest of the list, including the leading 
                -- delim, goes in "remainder".
                (before, remainder) = span (/= delim) str
                in
                before : case remainder of
                              [] -> []
                              x -> -- If there is more data to process,
                                   -- call split recursively to process it
                                   split delim (tail x)

-- Convenience aliases; we'll have two maps: one from UID to entries
-- and the other from username to entries
type UIDMap = Map.Map Integer PasswdEntry
type UserMap = Map.Map String PasswdEntry

{- | Converts input data to maps.  Returns UID and User maps. -}
inputToMaps :: String -> (UIDMap, UserMap)
inputToMaps inp =
    (uidmap, usermap)
    where
    uidmap = Map.fromList . map (\pe -> (uid pe, pe)) $ entries
    usermap = Map.fromList . 
              map (\pe -> (userName pe, pe)) $ entries
    -- Convert the input String to [PasswdEntry]
    entries = map read (lines inp)

main = do
    -- Load the command-line arguments
    args <- getArgs

    -- If we don't have the right number of args,
    -- give an error and abort

    when (length args /= 1) $ do
        putStrLn "Syntax: passwdmap filename"
        exitFailure

    -- Read the file lazily
    content <- readFile (head args)
    let maps = inputToMaps content
    mainMenu maps

mainMenu maps@(uidmap, usermap) = do
    putStr optionText
    sel <- getLine
    -- See what they want to do.  For every option except 4,
    -- return them to the main menu afterwards by calling
    -- mainMenu recursively
    case sel of
         "1" -> lookupUserName >> mainMenu maps
         "2" -> lookupUID >> mainMenu maps
         "3" -> displayFile >> mainMenu maps
         "4" -> return ()
         _ -> putStrLn "Invalid selection" >> mainMenu maps

    where 
    lookupUserName = do
        putStrLn "Username: "
        username <- getLine
        case Map.lookup username usermap of
             Nothing -> putStrLn "Not found."
             Just x -> print x
    lookupUID = do
        putStrLn "UID: "
        uidstring <- getLine
        case Map.lookup (read uidstring) uidmap of
             Nothing -> putStrLn "Not found."
             Just x -> print x
    displayFile = 
        putStr . unlines . map (show . snd) . Map.toList $ uidmap
    optionText = 
          "\npasswdmap options:\n\
           \\n\
           \1   Look up a user name\n\
           \2   Look up a UID\n\
           \3   Display entire file\n\
           \4   Quit\n\n\
           \Your selection: "

This example maintains two maps: one from username to PasswdEntry and another one from UID to PasswdEntry. Database developers may find it convenient to think of this has having two different indices into the data to speed searching on different fields.

Take a look at the Show and Read instances for PasswdEntry. There is already a standard format for rendering data of this type as a string: the colon-separated version already used by the system. So our Show function displays a PasswdEntry in the format, and Read parses that format.

^[10]The built-in type () carries no data, so there is little need to define your own unless you are writing FFI interfaces to C types.

Prev		Next
Chapter 8. Case study: regular expressions and file name matching	Home	Chapter 10. Using Parsec