[Progress on chapter 9. Bryan O'Sullivan **20071014063246] { hunk ./en/ch09-find-dsl.xml 21 - + hunk ./en/ch09-find-dsl.xml 62 - is a directory. If it is, it recursively lists that directory. + is a directory. If it is, it recursively calls + getRecursiveContents to list that directory. hunk ./en/ch09-find-dsl.xml 72 + Tie this into our earlier discussion of + return. + hunk ./en/ch09-find-dsl.xml 84 - action, and use that in the &if;. + action out of its IO wrapper, so we can use the + plain, unwrapped Bool in the &if;. hunk ./en/ch09-find-dsl.xml 93 - Why have both mapM and forM? + Revisiting anonymous and named functions + + In , we listed some + reasons not to use anonymous functions, and yet here we are, + using one as the body of a loop. This is one of the most + common uses of anonymous functions in Haskell. + + We've already seen from their types that + forM and mapM take + functions as arguments. Most loop bodies are blocks of code + that only appear once in a program. Since we're most likely to + use a loop body in only one place, why give it a name? + + Of course, it does happen that we need to deploy exactly + the same code in several different loops. Rather than cutting + and pasting the same anonymous function, it makes sense here + to take an existing anonymous function and give it a + name. + + + + Why provide both mapM and forM? hunk ./en/ch09-find-dsl.xml 122 - In the example above, we're using an anonymous function as - the body of our loop. If we were to use - mapM instead of - forM, we'd have to place the variable - properNames after the body of the anonymous - function. We'd have to wrap the entire anonymous function in - parentheses, or replace it with a local function in a &where; - clause, to make the code parse properly. + Consider our example above, using an anonymous function as + a loop body. If we were to use mapM + instead of forM, we'd have to place the + variable properNames after the body of the + function. In order to get the code to parse correctly, we'd + have to wrap the entire anonymous function in parentheses, or + replace it with a named function that would otherwise be + unnecessary. Try it yourself: copy the code above, replacing + forM with mapM, and + see what this does to the readability of the code. hunk ./en/ch09-find-dsl.xml 140 - forM lets you write the tidiest code; you - should decide this on a case by case basis. With just a - little practice, it will be obvious which to use in every - instance. + forM lets you write the tidiest code. If + the loop body and the expression computing the data over which + you're looping are both short, it doesn't matter which you + use. If the loop is short, but the data is long, use + mapM. If the loop is long, but the data + short, use forM. And if both are long, + use a &let; or &where; clause to make one of them short. With + just a little practice, it will become obvious which of these + approaches is best in every instance. hunk ./en/ch09-find-dsl.xml 151 + + + A naive finding function + + We can use our getRecursiveContents + function as the basis for a simple-minded file finder. + + &SimpleFinder.hs:simpleFind; + + This function takes a predicate that we use to filter the + names returned by getRecursiveContents. + Each name passed to the predicate is a complete path, so how can + we perform a common operation like find all files ending + in the extension .c? + + The System.FilePath module contains numerous + invaluable functions that help us to manipulate file names. In + this case, we want takeExtension. + + &simplefinder.ghci:takeExtension; + + This gives us a simple matter of writing a function that + takes a path, extracts its extension, and compares it with + .c. + + &simplefinder.ghci:find.c; + + While simpleFind works, it has a few + glaring problems. The first is that the predicate is not very + expressive. It can only look at the name of a directory entry, + and not for example find out whether it's a file or a directory. + This means that our attempt to use + simpleFind will list directories ending in + .c as well as files with the same extension. + + The second problem is that simpleFind + gives us no control over how it traverses the filesystem. To + see why this is significant, consider the problem of searching + for a source file in a tree managed by the Subversion revision + control system. Subversion maintains a private + .svn directory in every directory that it + manages; each one contains many subdirectories and files that + are of no interest to us. While we can easily enough filter out + any path containing .svn, it's more + efficient to simply avoid traversing these directories in the + first place. For example, one of us has a Subversion source + tree containing 45,000 files, 30,000 of which are stored in + 1,200 different .svn directories. It's + cheaper to avoid traversing those 1,200 directories than to + filter out the 30,000 files they contain. + + Finally, simpleFind is strict. If we + have a million files to traverse, we get one huge result + containing a million names, instead of a piecemeal lazy stream + of results. This is bad for both resource usage and + responsiveness. + + In the sections that follow, we'll overcome each one of + these problems. + + + + Predicates: from poverty to riches, while remaining + pure + + Our predicates can only look at file names. This excludes a + wide variety of interesting behaviours: for instance, what if + we'd like to list files of greater than a given size? + + An easy reaction to this is to reach for IO: + instead of our predicate being of type FilePath -> + Bool, why don't we change it to FilePath -> IO + Bool? This would let us perform arbitrary I/O as part + of our predicate. As appealing as this might seem, it's also + potentially a problem: such a predicate could have arbitrary + side effects, since a function with return type IO + a can have whatever side effects it pleases. + + Let's enlist the type system in our quest to write more + predictable, less buggy code: we'll keep predicates pure by + avoiding the taint of IO. This will ensure that + they can't have any nasty side effects. We'll feed them more + information, too, so that they can gain the expressiveness we + want without also becoming potentially dangerous. + + Haskell's portable System.Directory module + provides a useful, albeit limited, set of file metadata. + + &simplefinder.ghci:System.Directory; + + + + We can use doesFileExist and + doesDirectoryExist to determine whether + a directory entry is a file or a directory. There are not + yet portable ways to query for other file types that have + become widely available in recent years, such as named + pipes, hard links and symbolic links. + + &simplefinder.ghci:doesExist; + + + + + The getPermissions function lets us + find out whether certain operations on a file or directory + are allowed. + + &simplefinder.ghci:getPermissions; + + Directories are always searchable; + files never are. + + + + Finally, getModificationTime tells + us when an entry was last modified. + + &simplefinder.ghci:getModificationTime; + + + + + If we stick with portable, standard Haskell code, these + functions are all we have at our disposal. (We can also find a + file's size using a small hack; see below.) They're also quite + enough to let us illustrate the principles we're interested in, + without letting us get carried away with an example that's too + expansive. If you need to write more demanding code, the + System.POSIX and System.Win32 module + families provide much more detailed file metadata for the two + major modern computing platforms. + + How many pieces of data does our new, richer predicate need + to see? Since we can find out whether an entry is a file or a + directory by looking at its Permissions, we don't + need to pass in the results of + doesFileExist or + doesDirectoryExist. We thus have four + pieces of data that a richer predicate needs to look at. + + &BetterPredicate.hs:Predicate; + + Notice that the return value of this predicate is + Bool, not IO Bool: the predicate is + pure, and cannot perform I/O. With this type in hand, our more + expressive finder function is still quite trim. + + &BetterPredicate.hs:betterFind; + + Let's walk through the code. We'll talk about + getFileSize in some detail soon, so let's + skip over it for now. + + We can't use filter to call our + predicate p, as p's + purity means it cannot do the I/O needed to gather the metadata + it requires. + + This leads us to the unfamiliar function + filterM. It behaves like the normal + filter function, but in this case it + evaluates its predicate in the IO monad, so that + predicate can perform I/O. + + &simplefinder.ghci:filterM; + + Our check predicate is an I/O-capable + wrapper for our pure predicate p. It does all + the dirty work of I/O on p's behalf, so that + p can remain pure and incapable of causing + unwanted side effects. After gathering the metadata, + check calls p, then uses + return to wrap p's + result with IO. + addfile ./examples/ch09/BetterPredicate.hs hunk ./examples/ch09/BetterPredicate.hs 1 +{-- snippet imports --} +import Control.Monad (filterM) +import RecursiveContents (getRecursiveContents) +import System.Directory (Permissions(..), getModificationTime, getPermissions) +import System.Time (ClockTime(..)) +{-- /snippet imports --} +{-- snippet getFileSize --} +import Control.Exception (bracket, handle) +import System.IO (IOMode(..), hClose, hFileSize, openFile) + +getFileSize path = handle (const (return Nothing)) $ + bracket (openFile path ReadMode) hClose ((Just `fmap`) . hFileSize) +{-- /snippet getFileSize --} + +{-- snippet simpleFileSize --} +simpleFileSize :: FilePath -> IO Integer + +simpleFileSize path = do + h <- openFile path ReadMode + size <- hFileSize h + hClose h + return size +{-- /snippet simpleFileSize --} + +{-- snippet saferFileSize --} +saferFileSize path = bracket (openFile path ReadMode) hClose hFileSize +{-- /snippet saferFileSize --} + +{-- snippet Predicate --} +type Predicate = FilePath -- path to directory entry + -> Permissions -- permissions + -> Maybe Integer -- file size (Nothing if not file) + -> ClockTime -- last modified + -> Bool +{-- /snippet Predicate --} + +{-- snippet betterFind --} +getFileSize :: FilePath -> IO (Maybe Integer) + +betterFind :: Predicate -> FilePath -> IO [FilePath] + +betterFind p path = getRecursiveContents path >>= filterM check + where check name = do + perms <- getPermissions name + size <- getFileSize name + modified <- getModificationTime name + return (p name perms size modified) +{-- /snippet betterFind --} addfile ./examples/ch09/SimpleFinder.hs hunk ./examples/ch09/SimpleFinder.hs 1 +{-- snippet simpleFind --} +import RecursiveContents (getRecursiveContents) + +simpleFind :: (FilePath -> Bool) -> FilePath -> IO [FilePath] + +simpleFind p path = do + names <- getRecursiveContents path + return (filter p names) +{-- /snippet simpleFind --} addfile ./examples/ch09/simplefinder.ghci hunk ./examples/ch09/simplefinder.ghci 1 +--# takeExtension + +:m +System.FilePath + +:type takeExtension +takeExtension "foo/bar.c" +takeExtension "quux" + +--# find.c + +:load SimpleFinder +:type simpleFind (\p -> takeExtension p == ".c") + +--# System.Directory + +:m +System.Directory + +--# doesExist + +:type doesFileExist +doesFileExist "." + +:type doesDirectoryExist +doesDirectoryExist "." + +--# getPermissions + +:type getPermissions +:info Permissions +getPermissions "." +:type searchable +searchable it + +--# getModificationTime + +:type getModificationTime +getModificationTime "." + +--# filterM + +:type filterM }