[More progress on glob writeup. Bryan O'Sullivan **20070615063657] { hunk ./en/Makefile 15 + ch07/glob-to-regexp/GlobRegexEither.hs \ hunk ./en/Makefile 18 + ch07/glob.ghci \ hunk ./en/ch07-globs-and-regexps.xml 479 + + Exercises + + + + + + Use &ghci; to explore what happens if you pass a + malformed pattern, such as [, to + globToRegex. + + + + + + While filesystems on Unix are usually sensitive to + case (e.g. G vs. g) in + file names, Windows filesystems are not. Add a parameter + to the globToRegex and + matchesGlob functions that allows + control over case sensitive matching. + + + + + hunk ./en/ch07-globs-and-regexps.xml 655 - It's all very well to have a function that can match - glob patterns, but we'd like to be able to put this to practical - use. On Unix-like systems, the glob - function returns the names of all files and directories that - match a given glob pattern. + It's all very well to have a function that can match glob + patterns, but we'd like to be able to put this to practical use. + On Unix-like systems, the glob function + returns the names of all files and directories that match a + given glob pattern. Let's build a similar function in Haskell. + Following the Haskell norm of descriptive naming, we'll call our + function namesMatching. + + &Glob.hs:module; + + This function will obviously have to manipulate filesystem + paths a lot, splicing and joining them as it goes. We'll need + to use a few previously unfamiliar modules along the way. + + The System.Directory + module provides standard functions for working with directories + and their contents. + + &Glob.hs:import.directory; + + The System.FilePath + module abstracts the details of an operating system's path name + conventions. Using this together with the System.Directory module, we can + write a portable namesMatching function + that will run on both Unix-like and Windows systems. + + &Glob.hs:import.filepath; + + Finally, we'll be emulating a for loop; + getting our first taste of exception handline in Haskell; and of + course using the matchesGlob function we + just wrote. + + &Glob.hs:import.rest; hunk ./en/ch07-globs-and-regexps.xml 692 - world, our globbing function will have to have - IO in its result type. Following the - Haskell norm of descriptive naming, we'll call our function - namesMatching. + world of activities that have effects, our globbing + function will have to have IO in its + result type. hunk ./en/ch07-globs-and-regexps.xml 698 + If the string we're passed doesn't contain any pattern + characters, all we need to do is check that the given name + exists in the filesystem. (Notice that we use Haskell's + function guard syntax here to write a nice tidy definition. An + if would do, but isn't as aestheticall + pleasing.) + + &Glob.hs:mundane; + + (You might have noticed that the function + doesNameExist that we use here isn't in the + list of functions we imported from System.Directory. We'll define it + shortly.) + + What if the string is a glob + pattern? + + &Glob.hs:otherwise; + + We use splitFileName to split the + string into a pair of everything but the final + name and the final name. If the first + element is empty, we're looking for a pattern in the current + directory. + + &Glob.hs:curdir; + + Otherwise, we must check the directory name and see if it + contains patterns. If it does not, we create a singleton list + of theb directory name. If it contains a pattern, we list all + of the matching directories. + + &Glob.hs:pats; + + Point out the use of return above. + This will be one of the first times the reader will have + encountered it. Even if we've already discussed this topic + earlier, we should repeat ourselves. + + + Using the System.FilePath module can be a + litle tricky. Above is a case in point; let's use &ghci; to + illustrate the problem. The + splitFileName function leaves a trailing + slash on the end of the directory name that it returns. + + &glob.ghci:split; + + If we didn't remember (or know enough) to remove that + slash, we'd recurse endlessly in that call to + namesMatching above, because of the + following behaviour of + splitFileName. + + &glob.ghci:tricksy; + + You can guess why this note got to be written! + + + Finally, we collect all matches in every directory, giving + us a list of lists, and concatenate them into a single list of + names. + + &Glob.hs:glue; + + The unfamiliar forM function above acts + a little like a for loop: it maps its second + argument (an action) over its first (a list), and returns the + resulting list. + + We have a few loose ends to clean up. The first is the + definition of the doesNameExist function, + used above. The System.Directory doesn't let us + check to see if a name exists in the filesystem. It forces us + to think about whether we want to check for the existence of a + file or a directory, even if we don't care what kind of thing + the name is. In the name of portability, we make the check for + a file first, since files are far more common than + directories. + + &Glob.hs:doesNameExist; + + Fill these out: + &Glob.hs:listPlain; + &Glob.hs:listMatches; + + + Exercises + + + + Although we've gone to some lengths to write a + portable namesMatching function, + the function uses our case sensitive + globToRegex function. Find a way + to modify namesMatching to be case + sensitive on Unix, and case insensitive on Windows, + without modifying its type signature. + + + Hint: think about looking + through the documentation for System.FilePath for a + variable that tells you whether you're running on a + Unix-like system, or on Windows. + + + + + + If you're on a Unix-like system, look through the + documentation for the System.Posix.Files module, + and see if you can find a replacement for the + doesNameExist function. + + + + hunk ./en/ch07-globs-and-regexps.xml 822 + + Handling errors through API design + + It's not necessarily a disaster if our + globToRegex is passed a malformed pattern. + Perhaps a user mistyped a pattern, in which case we'd like to be + able to report a meaningful error message. + + Calling the error function when this + kind of problem happens can be a drastic response. The + error throws an exception. Pure Haskell + code cannot deal with exceptions, so control is going to rocket + out of your pure code into the nearest caller that lives in + IO and has an appropriate exception + handler installed. If no such handler is installed, the default + action is to terminate your program (or print a nasty error + message, in &ghci;). + + So calling error is a little like + pulling the handle of a fighter plane's ejection seat. You're + bailing out of a catastrophic situation that you can't deal with + more gracefully, and there's likely to be a lot of flaming + wreckage involved. + + We've established that error is for + disasters, but we're still using it in + globToRegex, where malformed input should + be rejected, but not turned into a big deal. What would be a + better way to handle this? + + Haskell's type system to the rescue! We can encode the + possibility of failure in the type signature of + globToRegex, using the predefined + Either type. + + &GlobRegexEither.hs:type; + + A value returned by globToRegex will + now be either Left "an error message" or + Right "a valid regexp". This return type + forces our callers to deal with the possibility of error. + + + Exercises + + + + Write a version of globToRegex + that uses the type signature above. + + + + + + Modify the type signature of + namesMatching so that it encodes + the possibility of a bad pattern, and make it call your + rewritten globToRegex + function. + + + + + hunk ./examples/ch07/glob-to-regexp/Glob.hs 1 +{-- snippet module --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 3 +{-- /snippet module --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 5 +{-- snippet import.rest --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 7 -import Control.Monad (forM, liftM) +import Control.Monad (forM) hunk ./examples/ch07/glob-to-regexp/Glob.hs 9 +{-- /snippet import.rest --} +{-- snippet import.directory --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 13 +{-- /snippet import.directory --} +{-- snippet import.filepath --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 16 +{-- /snippet import.filepath --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 22 +{-- snippet mundane --} +isPattern :: String -> Bool +isPattern = any (`elem` "[*?") + hunk ./examples/ch07/glob-to-regexp/Glob.hs 27 - | isMagical pat = do + | not (isPattern pat) = do + exists <- doesNameExist pat + return (if exists then [pat] else []) +{-- /snippet mundane --} +{-- snippet otherwise --} + | otherwise = do +{-- /snippet otherwise --} +{-- snippet curdir --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 38 - globPattern curDir baseName + listMatches curDir baseName +{-- /snippet curdir --} +{-- snippet pats --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 42 - dirs <- if isMagical dirName + dirs <- if isPattern dirName hunk ./examples/ch07/glob-to-regexp/Glob.hs 45 - let glob = if isMagical baseName then globPattern else globPlain +{-- /snippet pats --} +{-- snippet glue --} + let listDir = if isPattern baseName + then listMatches + else listPlain hunk ./examples/ch07/glob-to-regexp/Glob.hs 51 - baseNames <- glob dir baseName + baseNames <- listDir dir baseName hunk ./examples/ch07/glob-to-regexp/Glob.hs 54 - | otherwise = do - exists <- doesNameExist pat - return (if exists then [pat] else []) +{-- /snippet glue --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 56 -{-- snippet type --} -isMagical :: String -> Bool -isMagical = any (`elem` "[*?") -{-- /snippet type --} - -globPlain :: FilePath -> String -> IO [String] -globPlain dirName baseName = do +{-- snippet listPlain --} +listPlain :: FilePath -> String -> IO [String] +listPlain dirName baseName = do hunk ./examples/ch07/glob-to-regexp/Glob.hs 63 +{-- /snippet listPlain --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 65 -globPattern :: FilePath -> String -> IO [String] -globPattern dirName pat = do +{-- snippet listMatches --} +listMatches :: FilePath -> String -> IO [String] +listMatches dirName pat = do hunk ./examples/ch07/glob-to-regexp/Glob.hs 73 - let names' = if notHidden pat - then filter notHidden names - else names + let names' = if isHidden pat + then filter isHidden names + else filter (not . isHidden) names hunk ./examples/ch07/glob-to-regexp/Glob.hs 77 +{-- /snippet listMatches --} hunk ./examples/ch07/glob-to-regexp/Glob.hs 80 -notHidden :: String -> Bool -notHidden name = take 1 name /= "." +isHidden :: String -> Bool +isHidden name = take 1 name == "." hunk ./examples/ch07/glob-to-regexp/Glob.hs 94 - addfile ./examples/ch07/glob-to-regexp/GlobRegexEither.hs hunk ./examples/ch07/glob-to-regexp/GlobRegexEither.hs 1 +module GlobRegexEither + ( + globToRegex + , matchesPattern + ) where + +import Text.Regex.Posix ((=~)) + +regexChars :: [Char] +regexChars = "+()^$.{}]|" + +{-- snippet type --} +type GlobError = String + +globToRegex :: String -> Either GlobError String +{-- /snippet type --} + +globToRegex = globToRegex' [] + +globToRegex' :: [Char] -> String -> Either GlobError [Char] + +globToRegex' acc ('*':cs) = globToRegex' ("*." ++ acc) cs +globToRegex' acc ('?':cs) = globToRegex' ('.':acc) cs +globToRegex' acc ('[':'!':c:cs) = charClass ("^[" ++ acc) cs +globToRegex' acc ('[':c:cs) = charClass ('[':acc) cs +globToRegex' acc ('[':_) = Left "unterminated character class" +globToRegex' acc (c:cs) + | c `elem` regexChars = globToRegex' (c:'\\':acc) cs + | otherwise = globToRegex' (c:acc) cs +globToRegex' acc "" = Right (reverse ('$':acc)) + +charClass :: [Char] -> String -> Either GlobError [Char] + +charClass acc (']':cs) = globToRegex' (']':acc) cs +charClass acc (c:cs) = charClass (c:acc) cs +charClass acc [] = Left "unterminated character class" + +matchesPattern :: String -> String -> Either GlobError Bool +name `matchesPattern` pat = case globToRegex pat of + Left err -> Left err + Right regex -> Right (name =~ regex) addfile ./examples/ch07/glob.ghci hunk ./examples/ch07/glob.ghci 1 +--# split +:module +System.FilePath +splitFileName "foo/bar" + +--# tricksy +splitFileName "foo/" }