[More piping John Goerzen **20071025104119] { hunk ./en/ch20-systems.xml 483 - Piping + Extended Example: Piping hunk ./en/ch20-systems.xml 588 + There are a few other housekeeping things we must be careful about. + When you call forkProcess, just about everything + about your program is clonedThe main exception is + threads, which are not cloned. That includes + the set of open file descriptors (handles). Programs detect when + they're done receiving input from a pipe by checking the end-of-file + indicator. When the process at the writing end of a pipe closes the + pipe, the process at the reading end will receive an end-of-file + indication. However, if the writing file descriptor exists in more + than one process, the end-of-file indicator won't be sent until all + processes have closed that particular FD. Therefore, we must keep + track of which FDs are opened so we can close them all in the child + processes. We must also close the child ends of the pipes in the + parent process as soon as possible. + + + Here is an initial implementation of a system of piping in Haskell. + + &RunProcessSimple.hs:all; + + Let's experiment with this in &ghci; a bit before looking at how it + works. + + &rps.ghci:all; + + We start by running a simple command, pwd, which + just prints the name of the current working directory. We pass + [] for the list of arguments, because + pwd doesn't need any arguments. Due to the + typeclasses used, Haskell can't infer the type of + [], so we specifically mention that it's a + &String;. + + + Then we get into more complex commands. We run + ls, sending it through grep. + At the end, we set up a pipe to run the exact same command that we + ran via a shell-built pipe at the start of this section. It's not + yet as pleasant as it was in the shell, but then again our program is + still relatively simple when compared to the shell. + + + Let's look at the program. The very first line has a special + OPTIONS_GHC clause. This is the same as passing + -fglasgow-exts to &ghc; or &ghci;. We are using a + GHC extension that permits us to use a (String, + [String]) type as an instance of a + typeclass.This extension is well-supported in the + Haskell community; Hugs users can access the same thing with + hugs -98 +o. By putting + it in the source file, we don't have to remember to specify it every + time we use this module. + + + After the import lines, we define a few types. + First, we define type SysCommand = (String, + [String]) as an alias. This is the type a command to be + executed by the system will take. We used data of this type for each + command in the example execution above. The + CommandResult type represents the result from + executing a given command, and the CloseFDs type + represents the list of FDs that we must close upon forking a new + child process. + + + Next, we define a class named CommandLike. This + class will be used to run "things", where a "thing" might be a + standalone program, a pipe set up between two or more programs, or in + the future, even pure Haskell functions. To be a member of this + class, only one function -- invoke -- needs to be + present for a given type. This will let us use + runIO to start either a standalone command or a + pipeline. It will also be useful for defining a pipeline, since we + may have a whole stack of commands on one or both sides of a given + command. + + + Our piping infrastructure is going to use strings as the way of + sending data from one process to another. We can take advantage of + Haskell's support for lazy reading via &hGetContents; while reading + data, and use forkIO to let writing occur in the + background. This will work well, although not as fast as connecting + the endpoints of two processes directly together.The + Haskell library HSH provides a similar API to that presented + here, but uses a more efficient (and much more complex) mechanism + of connecting pipes directly between external processes without + the data needing to pass through Haskell. This is the same + approach that the shell takes, and reduces the CPU load of + handling piping. It makes implementation quite + simple, however. We need only take care to do nothing that would + require the entire &String; to be buffered, and let Haskell's + laziness do the rest. + + + Next, we define an instance of + CommandLike for SysCommand. We + create two pipes: one to use for the new process's standard input, + and the other for its standard output. This creates four endpoints, + and thus four file descriptors. We add the parent file descriptors + to the list of those that must be closed in all children. These + would be the write end of the child's standard input, and the read + end of the child's standard output. Next, we fork the child process. + In the parent, we can then close the file descriptors that correspond + to the child. We can't do that before the fork, because then they + wouldn't be available to the child. We obtain a handle for the + stdinwrite file descriptor, and start a thread via + forkIO to write the input data to it. We then + define waitfunc, which is the action that the + caller will invoke when it is ready to wait for the called process to + terminate. Meanwhile, the child uses dupTo, + closes the file descriptors it doesn't need, and executes the + command. + + + Next, we define some utility functions to manage the list of file + descriptors. After that, we define the tools that help set up + pipelines. First, we define a new type + PipeCommand that has a source and destination. + Both the source and destination must be members of + CommandLike. We also define the + -|- convenience operator. Then, we make + PipeCommand an instance of + CommandLike. Its invoke + implementation starts the first command with the given input, obtains + its output, and passes that output to the invocation of the second + command. It then returns the output of the second command, and + causes the getExitStatus function to wait for and + check the exit statuses from both commands. + + + We finish by defining runIO. This function + establishes the list of FDs that must be closed in the client, starts + the command, displays its output, and checks its exit status. + + + hunk ./examples/ch20/RunProcess.hs 123 -'CommandLine'. -} +'CommandLike'. -} hunk ./examples/ch20/RunProcessSimple.hs 4 -module RunProcess where +module RunProcessSimple where hunk ./examples/ch20/RunProcessSimple.hs 116 -'CommandLine'. -} +'CommandLike'. -} }