Table of Contents
Topics: built-in types. Writing functions. Creating new types.
“Haskell has a strong, static type system with inference.” For a short sentence, this one gives us a lot for us to pore over. Because Haskell is quite different from mainstream programming languages in how it treats types, let's not assume too much shared understanding as we talk about types.
Every value and expression in Haskell has a
type. The type of a value indicates that
it shares certain properties with other values that have the
same type. (You'll see us refer to a value “having the
type X
”, or “being of type
X
”. The two phrases mean the same thing.)
All values that have the type Integer have the
ability to be added to other values of type
Integer, and so on.
Haskell has a “strong” type system, in which every expression and value has exactly one type. Another aspect of Haskell's view of strong typing is that it will not automatically convert values from one type to another, a feature present in some other languages.
Having a “static” type system means that the compiler knows the type of every value and expression at compile time, before any code is ever executed. A Haskell compiler or interpreter will detect when we try to use types inconsistently in our code, and reject the code with an error message.
ghci> 'j' :: Int <interactive>:1:0: Couldn't match expected type `Int' against inferred type `Char' In the expression: 'j' In the expression: 'j' :: Int In the definition of `it': it = 'j' :: Intghci>
'j' :: Int
<interactive>:1:0: Couldn't match expected type `Int' against inferred type `Char' In the expression: 'j' In the expression: 'j' :: Int In the definition of `it': it = 'j' :: Int
Haskell's combination of strong and static typing also makes it impossible for type errors to occur at runtime.
Finally, “type inference” means what it says: the compiler can automatically figure out the types of most values for us, so that we don't have to explicitly label them.
We've already briefly seen Haskell's notation for types in
the section called “First steps with types”. We write expression
:: MyType
to say that
expression
has the type MyType.
We've already encountered a few types in the section called “First steps with types”. Haskell has a number of built-in types that we'll use over and over. Here are some of the language's basic types, types that aren't composed of other types
The Char type represents a character. The values of Char are drawn from the Unicode character set, which covers most of the world's written languages.
The Bool type represents a value in Boolean
logic. The possible values of Bool are
True
and False
.
The Int type represents a signed, fixed-width integer. The exact range of values represented by Int depends on the system's longest “native” integer: on a 32-bit machine, an Int is usually 32 bits wide, while on a 64-bit machine, it is usually 64 bits wide.
The Integer type represents a signed integer of arbitrary size. Integers are not used as often as Ints, because they're a lot more expensive to work with.
The Double type is the type usually used to represent floating point numbers. It is typically 64 bits wide, and uses the machine's native floating point representation. (A narrower type, Float, also exists, but its use is discouraged; Haskell compiler writers concentrate more on making Double efficient.)
The most common composite data types in Haskell are the list and tuple.
We've already seen the list type mentioned in the section called “Text, strings and lists”, where we found that Haskell represents a text string as a list of Char values, and that the type “list of Char” is written [Char].
More generally, we can write the type “list of
a
” for any type a
by enclosing a
in square brackets, [a].
Lists are strongly typed: a list of one type has an identity of
its own, distinct from a list of another type. The type
[Int] is a list that can only contain values of
type Int, for example. We also call the list type
polymorphic, because it can contain any
type of value.
Lists are the “bread and butter” of Haskell collections. Whereas in an imperative language, we might repeat a task over many items by iterating through a loop, this is something that we tend to do in Haskell by recursing over a list. We'll be spending a lot more time discussing lists in Chapter 4, Functional programming.
A tuple is a fixed-size collection of values, each of which can be of any type. Unlike a list (the elements of which must all have the same type), there's no need for the elements of a tuple to have related types. We write a tuple by enclosing its elements in parentheses and separating them with commas. We use the same notation for writing its type.
ghci> (True, "hello") :: (Bool, String) (True,"hello") ghci> (4, ['a', 'm'], (16, True)) (4,"am",(16,True))ghci>
(True, "hello") :: (Bool, String)
(True,"hello")ghci>
(4, ['a', 'm'], (16, True))
(4,"am",(16,True))
We can construct a tuple with any number of elements (although in practice, a tuple with more than a handful becomes unwieldy). A tuple's type encodes the number and types of its elements in its own type. This means that tuples containing different numbers or types of elements have distinct types of their own.
ghci> :type (4 :: Int, ['a', 'm'], (16 :: Integer, True)) (4 :: Int, ['a', 'm'], (16 :: Integer, True)) :: (Int, [Char], (Integer, Bool))ghci>
:type (4 :: Int, ['a', 'm'], (16 :: Integer, True))
(4 :: Int, ['a', 'm'], (16 :: Integer, True)) :: (Int, [Char], (Integer, Bool))
A two-tuple of an Int and a String has a different type than a two-tuple of a Bool and a Bool, for example, and a three-tuple has a different type than a four-tuple.
Probably the most common use of tuples is to let us return multiple values from a function. We can also use them in other places where we want a fixed-size collection of values, for which the kind of container we're using isn't of much importance. An example of this might be to represent a row in the result of a a database query.
Now that we've had our fill of data types for a while, let's turn our attention to working with some of the types we've seen. We've already seen how to perform arithmetic in the section called “Simple arithmetic”, and it looks quite as it does in other languages.
However, our discussion of lists and tuples mentioned how we can construct them, but not how we do anything with them afterwards. Haskell defines a large library of functions for working with lists and (to a lesser extent) tuples, so let's find out how to use a few of those functions.
To call a function in Haskell, we provide the name of the
function followed by its arguments, all separated by white space.
As an example, let's call the head
function, which returns the first element of a list.
ghci> head [1,2,3] 1ghci>
head [1,2,3]
1
Its counterpart, tail
, returns all
but the head of a list.
ghci> tail [True, False] [False]ghci>
tail [True, False]
[False]
A related pair of functions, take
and
drop
, take two arguments: given a number
and a list, they return either the first, or all but the first,
number of elements of the list. (Here, remember that a
String is a list of Char.)
ghci> take 2 "abcdefg" "ab" ghci> drop 3 "abcdefg" "defg"ghci>
take 2 "abcdefg"
"ab"ghci>
drop 3 "abcdefg"
"defg"
If you're used to function call syntax in other languages, this notation can take a little getting used to, but it's undeniably simple and uniform. Here's what we mean by uniform.
ghci> snd (1,2) 2ghci>
snd (1,2)
2
In some other languages, the call to
snd
above might mean “call
snd
with two arguments,
1
and 2
”, but in
Haskell, it's a single-argument call, passing the tuple
(1,2)
.
By the way, snd
has a companion
function, fst
, which returns the first
element of a tuple.
ghci> fst ("you", True) "you"ghci>
fst ("you", True)
"you"
Haskell evaluates an expression from left to right. If we want to use the result of one expression as an argument to another, we have to keep this in mind and use parentheses to tell the parser what we really mean. Here's an example.
ghci> head (drop 4 "azerty") 't'ghci>
head (drop 4 "azerty")
't'
We can read this as “pass the result of the
expression drop 4 "azerty"
as the argument to
head
”. If we were to leave out
the parentheses, Haskell would instead interpret the
expression as “pass drop
as the
argument to head
”. Compilation
would fail with a type error, as drop
is
a function, not a list.
A consequence of Haskell's strong typing is that
fst
and snd
only
accept two-tuples as arguments. Let's see what ghci tells us
about their types.
ghci> :type fst fst :: (a, b) -> a ghci> :type snd snd :: (a, b) -> bghci>
:type fst
fst :: (a, b) -> aghci>
:type snd
snd :: (a, b) -> b
We can read the ->
above as
“returns”. The entire signature thus tells us that
fst
takes any two-tuple whose elements are
of types a
and b
(each of which can be of any type),
and returns a value with type a
.
Sure enough, if we try calling fst
with a
three-tuple, things don't go so well.
ghci> fst (1,True,"foo") <interactive>:1:4: Couldn't match expected type `(a, b)' against inferred type `(a1, b1, c)' In the first argument of `fst', namely `(1, True, "foo")' In the expression: fst (1, True, "foo") In the definition of `it': it = fst (1, True, "foo")ghci>
fst (1,True,"foo")
<interactive>:1:4: Couldn't match expected type `(a, b)' against inferred type `(a1, b1, c)' In the first argument of `fst', namely `(1, True, "foo")' In the expression: fst (1, True, "foo") In the definition of `it': it = fst (1, True, "foo")
Notice that the type signatures above give us a strong hint as to what these functions might actually do. This is an incredibly valuable property of types in a functional language. Since there aren't usually any side effects for us to worry about, figuring what a function does can often be a matter of reading its name and understanding its type signature, with no “regular documentation” required.
Just as we called the list type polymorphic because it can contain values of any type, when a function has type variables in its signature, indicating that some of its arguments can be of any type, we call the function polymorphic, too.
So far, we haven't seen a signature for a function that
takes more than one argument. We've
already encountered a few such functions; let's look at
take
.
ghci> :type take take :: Int -> [a] -> [a]ghci>
:type take
take :: Int -> [a] -> [a]
It's pretty clear that there's something going on with an
Int and some lists, but why are there two
->
symbols in the signature? Haskell
parses this chain of arrows from right to left. If we introduce
parentheses, it makes it clearer how Haskell is interpreting
this type signature.
take :: Int -> ([a] -> [a])take :: Int -> ([a] -> [a])
From this, it looks like we ought to read the type signature as a function that takes one argument, an Int, and returns another function. That other function also takes one argument, a list, and returns a list of the same type as its argument.
This is an intriguing idea, but it's not easy to see just yet what its consequences might be. We'll return to this topic in the section called “Partial function application and currying”, once we've spent a bit of time writing functions.
Now that we know how to call functions, it's time we turned our attention to writing them. While we can write functions in ghci, it's not a good environment for doing so, because it limits any expression or definition to one line in length. So instead, we'll finally break down and create a source file.
Haskell source files are usually identified with a suffix of
.hs
. Here's a simple function definition:
open up a file named add.hs
, and add these
contents to it.
add a b = a + badd a b = a + b
On the left hand side of the =
is the
name of the function, followed by the arguments to the function.
On the right hand side is the body of the function. With our
source file saved, we can load it into ghci, and use our new
add
function straight away.
ghci> :load add.hs [1 of 1] Compiling Main ( add.hs, interpreted ) Ok, modules loaded: Main. ghci> add 1 2 3ghci>
:load add.hs
[1 of 1] Compiling Main ( add.hs, interpreted ) Ok, modules loaded: Main.ghci>
add 1 2
3
When we call add
, the variables
a
and b
on the left hand
side of our definition are given (or “bound to”)
the values 1
and 2
, then
the right hand side is evaluated, and the result
returned.
Haskell doesn't need a return statement; the result of a function is the result of evaluating whatever expression is in the function's body.
If you're used to imperative programming languages, you're likely to think of a variable as a way of identifying a “box”. You can get the value that's currently in a box, and you can change the value in the box using assignment.
In Haskell, a variable is the name of a
value. This is critically different from
the name of a box: we're used to being able to change what's
inside a box from our dealings with other languages, but we
can't change the “twoness” of the value
2
.
Creating a variable in Haskell just gives a convenient name to a value; we can't change the value associated with a variable. This bears a strong similarity to the way we think about variables in mathematics.
Like other languages, Haskell has an if
expression. Let's see it in action, then we'll explain what's
going on. As an example, we'll write our own version of the
standard take
function. Before we begin,
let's probe a little bit of how take
behaves, so we can replicate its behaviour.
ghci> take 0 "foo" "" ghci> take 1 "foo" "f" ghci> take 4 [1,2] [1,2] ghci> take 7 [] [] ghci> take (-2) "foo" ""ghci>
take 0 "foo"
""ghci>
take 1 "foo"
"f"ghci>
take 4 [1,2]
[1,2]ghci>
take 7 []
[]ghci>
take (-2) "foo"
""
From the above, it seems that take
returns an empty list if the number to remove is greater than
the number of elements, and that it treats negative numbers as
zero. Here's a myTake
function that has
the same behaviour, and uses Haskell's if
expression to decide what to do.
myTake :: Int -> [a] -> [a] myTake n xs = if n <= 0 || null xs then xs else myTake (n - 1) (tail xs)myTake :: Int -> [a] -> [a] myTake n xs = if n <= 0 || null xs then xs else myTake (n - 1) (tail xs)
Let's save it in a file named
myTake.hs
, then load it into
ghci.
ghci> :load myTake.hs [1 of 1] Compiling Main ( myTake.hs, interpreted ) Ok, modules loaded: Main. ghci> myTake 0 "foo" "foo" ghci> myTake 1 "foo" "oo" ghci> myTake 4 [1,2] [] ghci> myTake 7 [] [] ghci> myTake (-2) "foo" "foo"ghci>
:load myTake.hs
[1 of 1] Compiling Main ( myTake.hs, interpreted ) Ok, modules loaded: Main.ghci>
myTake 0 "foo"
"foo"ghci>
myTake 1 "foo"
"oo"ghci>
myTake 4 [1,2]
[]ghci>
myTake 7 []
[]ghci>
myTake (-2) "foo"
"foo"
Now that we've seen myTake
in action,
let's return to the source code and look at a few of the
novelties we've introduced.
First is the if
keyword itself. It takes an
expression of type Bool. If that evaluates to
True
, it evaluates the expression on the
then
branch. Otherwise, it evaluates the
expression on the else
branch.
The combination of if
, then
,
else
and the expression after each keyword
combine to make up a single expression, as far as Haskell is
concerned. Whichever branch is evaluated is the result of the
if
. Because an expression can only have one
type, the expressions in the then
and
else
branches must have the same type. If they
don't, an if
expression won't typecheck.
Whereas it can make sense in an imperative language to
omit the else
branch from an if
,
this would be nonsensical in Haskell. An if
expression that was missing an else
wouldn't have
a value if the condition after the if
evaluated
to False
, so it couldn't typecheck.
The second novelty is almost trivial: the
null
function, which we use in the
Boolean portion of the if
, indicates whether a
list is empty.
ghci> :type null null :: [a] -> Boolghci>
:type null
null :: [a] -> Bool
Third is that our function calls itself recursively. This is an early example of how, in Haskell, we use recursion where in an imperative language we'd probably use a loop.
Finally, our if
expression spans several
lines. We line the then
and else
branches up under the if
for neatness, but this
is not mandatory. We could put all of them on a single line,
for example, but we'd end up with less readable code.
myTake2 n xs = if n <= 0 || null xs then xs else myTake (n - 1) (tail xs)myTake2 n xs = if n <= 0 || null xs then xs else myTake (n - 1) (tail xs)
Haskell provides a standard function, error ::
Char -> a
, that we can call when something has
gone terribly wrong in our code. We give it a string
parameter, which is the error message to display. Its type
signature looks peculiar; how can it produce a value of any
type a
given just a string? The
answer is that it doesn't, because error
is special.
It has that type so that we can call it anywhere and it
will always have the right type. However, instead of
returning a value like a normal function, it immediately
terminates evaluation, and prints the error message we give
it. Here's an example. The mySecond
function returns the second element of its input list, but
fails if its input list isn't long enough.
mySecond :: [a] -> a mySecond xs = if null (tail xs) then error "list too short" else head (tail xs)mySecond :: [a] -> a mySecond xs = if null (tail xs) then error "list too short" else head (tail xs)
As usual, we can see how this works in practice in ghci.
ghci> mySecond "xi" 'i' ghci> mySecond [2] *** Exception: list too short ghci> head (mySecond [[9]]) *** Exception: list too shortghci>
mySecond "xi"
'i'ghci>
mySecond [2]
*** Exception: list too shortghci>
head (mySecond [[9]])
*** Exception: list too short
Notice that in the third case above, where we're trying to
use the result of the call to mySecond
as
the argument to another function, evaluation still terminates
and drops us back to the ghci prompt. This is both the
strength and weakness calling error
: it
doesn't let us distinguish between recoverable errors and
problems so severe that they really should terminate a
program. We'll see one way around this problem in
the section called “How to represent a complicated result”.
Although lists and tuples are useful, we'll still often want
to construct new data types of our own. We define a new data
type using the data
keyword.
data MyType = MyConstructor Int String deriving (Show)data MyType = MyConstructor Int String deriving (Show)
The MyType after the data
keyword
is the name of our new type. (As we've already mentioned, a
type name must start with a capital letter.) The string
MyConstructor
is the name of the
constructor we'll call to create a value of
this type. (As with a type name, a constructor name must start
with a capital letter.) Finally, the Int and
String are the components of
the type. A component serves the same purpose in Haskell as a
field in a structure or class would in another language.
Note | |
---|---|
We'll explain the full meaning of |
We can create a new value of type MyType by
treating MyConstructor
as a function, and
calling it with arguments of types Int and
String.
myValue = MyConstructor 31337 "Creating a value!"myValue = MyConstructor 31337 "Creating a value!"
Once we've defined a type, we can experiment with it in ghci, starting by using the :load command to load our source file.
ghci> :load MyType.hs [1 of 1] Compiling Main ( MyType.hs, interpreted ) Ok, modules loaded: Main.ghci>
:load MyType.hs
[1 of 1] Compiling Main ( MyType.hs, interpreted ) Ok, modules loaded: Main.
Remember the myValue
variable we defined? Here
it is.
ghci> myValue MyConstructor 31337 "Creating a value!" ghci> :type myValue myValue :: MyTypeghci>
myValue
MyConstructor 31337 "Creating a value!"ghci>
:type myValue
myValue :: MyType
We can construct new values interactively in ghci, too.
ghci> MyConstructor 1 "foo" MyConstructor 1 "foo"ghci>
MyConstructor 1 "foo"
MyConstructor 1 "foo"
To find out more about a type, we can use some of ghci's browsing capabilities. The :info command gets ghci to tell us everything it knows about a type.
ghci> :info MyType data MyType = MyConstructor Int String -- Defined at MyType.hs:2:5 instance Show MyType -- Defined at MyType.hs:2:5ghci>
:info MyType
data MyType = MyConstructor Int String -- Defined at MyType.hs:2:5 instance Show MyType -- Defined at MyType.hs:2:5
We can also find out why we use
MyConstructor
to construct a new value of
type MyType.
ghci> :type MyConstructor MyConstructor :: Int -> String -> MyTypeghci>
:type MyConstructor
MyConstructor :: Int -> String -> MyType
From Haskell's perspective, then, a constructor is just another function, one that happens to return a value of the type we want to construct.
The Bool type that we introduced earlier is the simplest example of a sort of type called an algebraic data type. An algebraic data type has a fixed set of possible values, each of which is identified by a distinct constructor.
In the case of Bool, the type has two
constructors, True
and False
. Each
constructor is separated by a |
character,
which we can read as “or”. These are usually
referred to as alternatives or cases.
data Bool = False | Truedata Bool = False | True
Each constructor can take zero or more arguments; the numbers and types of the arguments accepted by each constructor are independent. For example, here's one way we might represent versions of the Windows operating system, where old releases were monolithic, and newer releases have “service pack levels” denoting major updates after their initial releases.
data WindowsVersion = Win95 | Win98 | WinME | WinNT Int | WinXP Int | WinVista Int deriving (Show)data WindowsVersion = Win95 | Win98 | WinME | WinNT Int | WinXP Int | WinVista Int deriving (Show)
The alternatives that represent older releases don't need arguments, but those for the newer releases need an Int to represent the patch level.
Algebraic data types provide a single structuring mechanism in instances where other languages have several different building blocks. Here are some analogues from C and C++, which might make it clearer what we can do with algebraic data types.
With just one constructor, an algebraic data type groups
related values into one, and gives that value an identity
distinct from other types. It corresponds to a
struct
in C or C++, and its components to the
fields of a struct.
If an algebraic data type has multiple alternatives, we
can think of it as similar to a union
in C or
C++. A big difference between the two is that a union doesn't
tell us which alternative is actually present; we have to
record which alternative we're using ourselves, usually in
another field of a struct. This means that unions can
sometimes be sources of bugs, where our notion of which
alternative we should be using is incorrect.
With an algebraic data type, Haskell stores the constructor that we use in the value that we create, so we don't need to manually sock it away somewhere else.
Algebraic data types also serve where we'd use an
enum
in C or C++, to represent a range of
discrete symbolic values.
data Roygbiv = Red | Orange | Green | Blue | Indigo | Violet deriving (Show) -- Equivalent in C or C++: -- -- enum roygbiv { red, orange, green, blue, indigo, violet }; data Roygbiv = Red | Orange | Green | Blue | Indigo | Violet deriving (Show) -- Equivalent in C or C++: -- -- enum roygbiv { red, orange, green, blue, indigo, violet };
From reading the preceding sections, it should be clear
that all of the data types that we define
with the data
keyword are algebraic data types.
Some may have just one alternative; others have several; but
they're all using the same organising machinery.
Another useful thing to know is that it's perfectly okay for the name of an algebraic type to have the same name as one of its constructors to be the same. It's always obvious from context whether we're using a name to refer to a type or a constructor, so this doesn't introduce any ambiguity.
Giving a type and its constructor the same name is in fact something that we do frequently when the type has just one constructor.
data PerfectlyNormal = PerfectlyNormal Intdata PerfectlyNormal = PerfectlyNormal Int
When a type has multiple constructors, it's still legal to give one of the constructors the same name as the type, but this is much less common.
data LegalButWeird = SomethingFishy | LegalButWeird Stringdata LegalButWeird = SomethingFishy | LegalButWeird String
In our discussion of lists, we mentioned that we can create a list of values of any type. We can define our own types that allow this, too. To do this, we introduce variables into a type declaration.
data Wrapper a = Wrapper a deriving (Show)data Wrapper a = Wrapper a deriving (Show)
Here, the variable a
is not a
regular variable; it's called a type
variable, because it indicates that our
Wrapper type takes another type as its parameter
(hence calling it a parameterised type). What this lets us do is
use Wrapper on values of any type.
wrappedInt :: Wrapper Int wrappedInt = Wrapper 42 wrappedString = Wrapper "foo"wrappedInt :: Wrapper Int wrappedInt = Wrapper 42 wrappedString = Wrapper "foo"
As usual, we can load our source file into ghci and experiment with it.
ghci> :load Wrapper.hs [1 of 1] Compiling Main ( Wrapper.hs, interpreted ) Ok, modules loaded: Main. ghci> :type wrappedString wrappedString :: Wrapper [Char] ghci> Wrapper [1,2,3] Wrapper [1,2,3] ghci> :type Wrapper Wrapper :: a -> Wrapper aghci>
:load Wrapper.hs
[1 of 1] Compiling Main ( Wrapper.hs, interpreted ) Ok, modules loaded: Main.ghci>
:type wrappedString
wrappedString :: Wrapper [Char]ghci>
Wrapper [1,2,3]
Wrapper [1,2,3]ghci>
:type Wrapper
Wrapper :: a -> Wrapper a
Wrapper is a “generic” container type (albeit a fairly useless one); we can construct a Wrapper from a value of any type. It is also strongly typed; the type of whatever it contains is encoded in its own type.
To once again extend an analogy to more familiar languages, this gives us a facility that bears some resemblance to templates in C++, and to generics in Java. (In fact, Java's generics facility was inspired by several aspects of Haskell's type system.)
We can nest uses of parameterised types inside each other, but when we do, we may need to use parentheses to tell the Haskell compiler what we mean.
multiplyWrapped :: Wrapper (Wrapper Int) multiplyWrapped = Wrapper (Wrapper 7)multiplyWrapped :: Wrapper (Wrapper Int) multiplyWrapped = Wrapper (Wrapper 7)
Let's take a break from writing about types for a few
moments. Within the body of a function, we can introduce new
local variables whenever we need them, using a let
expression. As an example, let's write a function that
calculates the real-valued roots of the quadratic equation
a * (x ** 2) + b * x + c == 0
.
realRoots :: Double -> Double -> Double -> Maybe (Double, Double) realRoots a b c = let n = b**2 - 4 * a * c a2 = 2 * a r1 = (-b + sqrt n) / a2 r2 = (-b - sqrt n) / a2 in if n >= 0 && a /= 0 then Just (r1, r2) else NothingrealRoots :: Double -> Double -> Double -> Maybe (Double, Double) realRoots a b c = let n = b**2 - 4 * a * c a2 = 2 * a r1 = (-b + sqrt n) / a2 r2 = (-b - sqrt n) / a2 in if n >= 0 && a /= 0 then Just (r1, r2) else Nothing
The keywords to look out for here are let
,
which starts a block of variable declarations, and
in
, which ends it. Each line introduces a new
variable. The name is on the left of the =
,
and its value on the right. We can use these variables both
within our block of variable declarations and in the expression
that follows the in
keyword.
There's no problem with a variable earlier in a
let
block referring to a later one, or even with
them referring to each other. (In some functional languages,
this sort of flexible let
is named
letrec
.)
We can have multiple let
blocks within an
expression. There's also another mechanism we can use to
introduce local variables, called a where
block.
The definitions in a where
block apply to the code
that precedes it. Let's illustrate what we
mean with another example.
import Data.Complex roots :: Double -> Double -> Double -> Either (Complex Double, Complex Double) (Double, Double) roots a b c = if n >= 0 then Right ((-b + sqrt n) / a2, (-b - sqrt n) / a2) else Left ((-b' + sqrt n') / a2', (-b' - sqrt n') / a2') where n = b**2 - 4 * a * c a2 = 2 * a n' = n :+ 0 b' = b :+ 0 a2' = a2 :+ 0import Data.Complex roots :: Double -> Double -> Double -> Either (Complex Double, Complex Double) (Double, Double) roots a b c = if n >= 0 then Right ((-b + sqrt n) / a2, (-b - sqrt n) / a2) else Left ((-b' + sqrt n') / a2', (-b' - sqrt n') / a2') where n = b**2 - 4 * a * c a2 = 2 * a n' = n :+ 0 b' = b :+ 0 a2' = a2 :+ 0
Here, the roots
function returns the
real roots when they're defined, and the complex roots
otherwise. (We left out the divide-by-zero case for simplicity.)
While a “where” clause initially looks very weird
to non-Haskell programmers, it's a great way to put the
“important” code early, followed by the auxiliary
definitions that support it. After a while, you'll find
yourself missing where
clauses in languages that
lack them!
The main difference between let
and
where
is one of scope. The scope of a
let
only extends to the expression after the
in
keyword, while the variables introduced by a
where
clause are visible upwards to the beginning
of the block that it “belongs” to. Also,
let
is always paired with an expression, but
where
is paired with a block of equations.
We'll be talking more about how to write let
expressions and where
clauses in the section called “The offside rule, and white space in a function
body”.
We sneaked two previously unseen standard types,
Maybe and Either, into our
root-finding examples. We use Maybe when it might
not make sense to return a normal result, for example because a
function's result is undefined for some inputs. We use
Just
to say “we have a
result”, and the argument to Just
is that result. When we can't give a result, we use
Nothing
, which takes no arguments.
ghci> Just "answer" Just "answer" ghci> Nothing Nothingghci>
Just "answer"
Just "answer"ghci>
Nothing
Nothing
Why do we need Maybe here? The real-valued
roots of a quadratic equation are infinity when
a
, the coefficient of x **
2
, is zero.
ghci> :load Roots.hs [1 of 1] Compiling Main ( Roots.hs, interpreted ) Roots.hs:56:30: Not in scope: `c' Failed, modules loaded: none. ghci> realRoots 0 1 2 <interactive>:1:0: Not in scope: `realRoots'ghci>
:load Roots.hs
[1 of 1] Compiling Main ( Roots.hs, interpreted ) Roots.hs:56:30: Not in scope: `c' Failed, modules loaded: none.ghci>
realRoots 0 1 2
<interactive>:1:0: Not in scope: `realRoots'
They're also not defined when b ** 2 - 4 * a *
c
is negative, because we would need to use complex
numbers to represent a negative square root.
ghci> realRoots 1 3 4 <interactive>:1:0: Not in scope: `realRoots'ghci>
realRoots 1 3 4
<interactive>:1:0: Not in scope: `realRoots'
Otherwise, we can return a normal result, wrapped in
Just
.
ghci> realRoots 1 3 2 <interactive>:1:0: Not in scope: `realRoots'ghci>
realRoots 1 3 2
<interactive>:1:0: Not in scope: `realRoots'
Compared to error
, which we saw in
the section called “Reporting errors”, Maybe has
the huge advantage that it's a normal value, which we return to
our caller to deal with. Calling error
, by
contrast, is more akin to pulling on the program's ejector seat
handle as it disintegrates around us; something terrible has
happened, and we need to give up right now.
The Either a b type gives us even more
flexibility than Maybe, as it's got two type
parameters. We can wrap a value of any type a
with Left, or a value of
an unrelated type b
with
Right. Our roots
function
uses this to return a Double when the real roots
are defined, and a Complex Double when they're
not.
In our definition of realRoots
, the
left margin of our text wandered around quite a bit. This was
not an accident: in Haskell, white space has meaning.
Haskell uses indentation as a cue to parse sections of code. This use of layout to convey structure is sometimes called the offside rule. At the top level, the first declaration or definition can start in any column, and the Haskell compiler or interpreter remembers that indentation level. Every subsequent top-level declaration must have the same indentation.
Here's an illustration of the top-level indentation rule.
Our first file, GoodIndent.hs
, is well
behaved.
-- This is the leftmost column. -- It's fine for top-level declarations to start in any column... firstGoodIndentation = 1 -- ...provided all subsequent declarations do, too! secondGoodIndentation = 2-- This is the leftmost column. -- It's fine for top-level declarations to start in any column... firstGoodIndentation = 1 -- ...provided all subsequent declarations do, too! secondGoodIndentation = 2
Our second, BadIndent.hs
, doesn't play
by the rules.
-- This is the leftmost column. -- Our first declaration is in column 4. firstBadIndentation = 1 -- Our second is left of the first, which is illegal! secondBadIndentation = 2-- This is the leftmost column. -- Our first declaration is in column 4. firstBadIndentation = 1 -- Our second is left of the first, which is illegal! secondBadIndentation = 2
Here's what happens when we try to load the two files into ghci.
ghci> :load GoodIndent.hs [1 of 1] Compiling Main ( GoodIndent.hs, interpreted ) Ok, modules loaded: Main. ghci> :load BadIndent.hs [1 of 1] Compiling Main ( BadIndent.hs, interpreted ) BadIndent.hs:8:2: parse error on input `secondBadIndentation' Failed, modules loaded: none.ghci>
:load GoodIndent.hs
[1 of 1] Compiling Main ( GoodIndent.hs, interpreted ) Ok, modules loaded: Main.ghci>
:load BadIndent.hs
[1 of 1] Compiling Main ( BadIndent.hs, interpreted ) BadIndent.hs:8:2: parse error on input `secondBadIndentation' Failed, modules loaded: none.
An empty line is treated as a continuation of the current item, as is a line indented to the right of the current current item.
The rules for let
expressions and
where
clauses are similar. After a
let
or where
keyword, the Haskell
compiler or interpreter remembers the indentation of the next
token it sees. If the next line is empty, or its indentation is
further to the right than the previous line, this counts as
continuing the previous line. On the other hand, if the
indentation is the same as the previous line, this is treated as
beginning a new item in the same block.
Here are nested uses of let
and
where
.
bar = let b = 2 in let a = b in abar = let b = 2 in let a = b in a
In the body of bar
, the variable
a
is only visible within the let
expression that defines it. It's not visible to the
let
expression that defines b
;
only the result of the inner
let
expression is visible.
foo = a where a = b where b = 2foo = a where a = b where b = 2
Similarly, the scope of the first where
clause
is the definition of foo
, but the scope of
the second is just the first where
clause.
The indentation we use for the let
and
where
clauses makes our intentions easy to figure
out.
If you are using a Haskell-aware text editor (e.g. Emacs), it is probably already configured to use space characters for all white space within a line. If your editor is not Haskell-aware, you should configure it to only use space characters.
The reason for this is simple portability. In an editor that uses a fixed-width font, tab stops are by default placed at different intervals on Unix-like systems (every eight characters) than on Windows (every four characters). This means that no matter what your personal beliefs are about where tabs belong, you can't rely on someone else's editor honouring your preferences. Any indentation that uses tabs is going to look broken under someone's configuration; this could lead to compilation problems, too. Using space characters instead avoids this problem entirely.
We can use explicit structuring instead of layout to
indicate what we mean. To do so, we start a block of
equations with an opening curly brace; separate each item with
a semicolon; and finish the block with a closing curly brace.
The following two uses of let
have the same
meanings.
bar = let a = 1 b = 2 c = 3 in a + b + c foo = let { a = 1; b = 2; c = 3 } in a + b + cbar = let a = 1 b = 2 c = 3 in a + b + c foo = let { a = 1; b = 2; c = 3 } in a + b + c
When we use explicit structuring, the normal layout rules
don't apply, which is why we can get away with unusual
indentation in the second let
expression.
We can use explicit structuring anywhere that we'd
normally use layout. It's valid for where
clauses, and even top-level declarations. Just remember that
although the facility exists, explicit structuring is hardly
ever actually used in Haskell
programs.
Here's a definition of a binary tree type.
data Tree a = Node (Tree a) (Tree a) | Leaf a deriving (Show)data Tree a = Node (Tree a) (Tree a) | Leaf a deriving (Show)
We call this a recursive type because Tree, the type we're defining, appears both on the left hand side and the right hand side of the definition: we define the type in terms of itself.
Now that we're getting familiar with some of the jargon
around types, we can revisit lists. Haskell's list type is a
parameterised type, because we can make lists of any other type.
It is also an algebraic data type, with two constructors. One
is the empty list, written []
(sometimes pronounced
“nil”, which is borrowed from Lisp).
ghci> [] []ghci>
[]
[]
The other is the (:)
operator, often
pronounced “cons” (this is short for
“construct”, and also borrowed from Lisp). The
(:)
operator takes an element and a list,
and constructs a new list.
ghci> 1 : [] [1] ghci> 1 : [2] [1,2]ghci>
1 : []
[1]ghci>
1 : [2]
[1,2]
We can use (:)
repeatedly to add new
elements to the front of a list.
ghci> "alpha" : "beta" : ["gamma", "delta"] ["alpha","beta","gamma","delta"]ghci>
"alpha" : "beta" : ["gamma", "delta"]
["alpha","beta","gamma","delta"]
The right hand side of (:)
must be a
list, and of the correct type. If it's not, we'll get an
error.
ghci> True : False <interactive>:1:7: Couldn't match expected type `[Bool]' against inferred type `Bool' In the second argument of `(:)', namely `False' In the expression: True : False In the definition of `it': it = True : False ghci> True : ["wrong"] <interactive>:1:8: Couldn't match expected type `Bool' against inferred type `[Char]' Expected type: Bool Inferred type: [Char] In the expression: "wrong" In the second argument of `(:)', namely `["wrong"]'ghci>
True : False
<interactive>:1:7: Couldn't match expected type `[Bool]' against inferred type `Bool' In the second argument of `(:)', namely `False' In the expression: True : False In the definition of `it': it = True : Falseghci>
True : ["wrong"]
<interactive>:1:8: Couldn't match expected type `Bool' against inferred type `[Char]' Expected type: Bool Inferred type: [Char] In the expression: "wrong" In the second argument of `(:)', namely `["wrong"]'
Because (:)
constructs a list from
another list, the list type is recursive. So here we have a
built-in type that's parameterised, recursive, and
algebraic.
One consequence of lists being generic is that lists of lists, for example, aren't special in any way.
ghci> [["foo", "bar"], ["x", "y"]] [["foo","bar"],["x","y"]]ghci>
[["foo", "bar"], ["x", "y"]]
[["foo","bar"],["x","y"]]
This has type [[String]], a list of lists of strings. But since String is just a synonym for [Char], it's really a list of lists of lists of Char. Whew!
We're not limited to building up lists one element at a
time. Haskell defines an inline function,
(++)
, that we can use to append one list
onto the end of another.
ghci> :type (++) (++) :: [a] -> [a] -> [a] ghci> [2,3] ++ [4,5] [2,3,4,5] ghci> "joining " ++ " lists " ++ " together" "joining lists together"ghci>
:type (++)
(++) :: [a] -> [a] -> [a]ghci>
[2,3] ++ [4,5]
[2,3,4,5]ghci>
"joining " ++ " lists " ++ " together"
"joining lists together"
The concat
function takes a list of
lists, and concatenates the whole lot into a single list.
ghci> :type concat concat :: [[a]] -> [a] ghci> concat [[1,1,2], [3,5,8], [11]] [1,1,2,3,5,8,11]ghci>
:type concat
concat :: [[a]] -> [a]ghci>
concat [[1,1,2], [3,5,8], [11]]
[1,1,2,3,5,8,11]
Haskell has a special tuple type with no elements, written (), and pronounced “unit”.
ghci> () () ghci> :type () () :: ()ghci>
()
()ghci>
:type ()
() :: ()
This type is only really used with parameterised data types,
to indicate that one of the type parameters isn't being used.
Since it doesn't encode any information, it's a rough equivalent
to void
in C.
Here's an example of () in use. We can
generalise our earlier Tree type a little, so that
internal nodes contain values of type a
, while leaves contain values of type
b
.
data ComplexTree a b = ComplexNode a (ComplexTree a b) (ComplexTree a b) | ComplexLeaf b deriving (Show)data ComplexTree a b = ComplexNode a (ComplexTree a b) (ComplexTree a b) | ComplexLeaf b deriving (Show)
If we wanted to create a ComplexTree where we wanted to store Ints on the leaves, but don't care about the internal nodes. We would write its type as ComplexTree () Int.
Although we introduced a handful of functions earlier that can operate on lists, we've yet to see how we might generally get values out of a constructed algebraic data type. Haskell has a simple pattern matching facility that we can use to this end.
A pattern lets us peer inside a compound value and bind variables to the values it contains. In fact, when we define a function, the parameters to that function are really patterns that bind our variables to an entire value.
Here's an example of pattern matching in action on a list; we're going to add all elements of the list together.
sumList (x:xs) = x + sumList xs sumList [] = 0sumList (x:xs) = x + sumList xs sumList [] = 0
See that (x:xs)
on the left of the first line?
The :
means “match the head of a
list”; that's the familiar list constructor,
(:)
, in action in a new way. The variables
x
and xs
are given the
values of (“bound to”) the head and tail of the
list, respectively. The whole pattern is wrapped in parentheses
so Haskell won't parse it as three separate arguments.
What effect does pattern matching have? Haskell will only
evaluate the right hand side of an equation if it can match all
of the patterns on the left hand side. In the definition of
sumList
above, the right hand side of the
first equation won't be evaluated if the input list is empty.
Instead, Haskell will “fall through” to the
equation on the following line, which does
have a pattern for the empty list, and it will evaluate
that.
It might initially look like we have two functions named
sumList
here, but Haskell lets us define a
function as a series of equations; so in fact these two clauses
are defining the behaviour of one function, for different
inputs. (By the way, there's a standard function,
sum
, that does this for us.)
The syntax for pattern matching on a tuple is similar to the syntax for constructing a tuple. Here's a function that returns the third element from a three-tuple.
third (a, b, c) = cthird (a, b, c) = c
There's no limit on how “deep” within a value a pattern can look. Here's a definition that looks both inside a tuple and inside a list within that tuple.
complicated (True, a, x:xs, 5) = (a, xs)complicated (True, a, x:xs, 5) = (a, xs)
We can try this out interactively.
ghci> :load Tuple.hs [1 of 1] Compiling Main ( Tuple.hs, interpreted ) Ok, modules loaded: Main. ghci> complicated (True, 1, [1,2,3], 5) (1,[2,3])ghci>
:load Tuple.hs
[1 of 1] Compiling Main ( Tuple.hs, interpreted ) Ok, modules loaded: Main.ghci>
complicated (True, 1, [1,2,3], 5)
(1,[2,3])
Wherever a literal value is present in a pattern, that value must match exactly for the pattern match to succeed. If every pattern within a series of equations fails to match, we get a runtime error.
ghci> complicated (False, 1, [1,2,3], 5) *** Exception: Tuple.hs:6:0-39: Non-exhaustive patterns in function complicatedghci>
complicated (False, 1, [1,2,3], 5)
*** Exception: Tuple.hs:6:0-39: Non-exhaustive patterns in function complicated
We can pattern match on algebraic data types using their constructors. Remember the Wrapper type we defined earlier? Here's how we can extract a wrapped value from a Wrapper.
unwrap (Wrapper x) = xunwrap (Wrapper x) = x
ghci> unwrap (Wrapper "foo") "foo" ghci> :type unwrap unwrap :: Wrapper t -> tghci>
unwrap (Wrapper "foo")
"foo"ghci>
:type unwrap
unwrap :: Wrapper t -> t
Notice that Haskell infers the type of the
unwrap
function based on the constructor
we're using in our pattern. If we're trying to match a value
whose constructor is Wrapper
, then the type
of that parameter must be Wrapper a.
When we're writing a pattern, we can specify that we don't
care what value a particular value within a structure has,
without actually binding that value to a name. The notation
for this is _
(called a wild card or “don't
care”), and we use it as follows. This function
tells us whether the result of the roots
function we defined earlier is real-valued or not.
isRealValued :: Either (Complex Double, Complex Double) (Double, Double) -> Bool isRealValued (Left _) = False isRealValued _ = TrueisRealValued :: Either (Complex Double, Complex Double) (Double, Double) -> Bool isRealValued (Left _) = False isRealValued _ = True
Here, we don't care about the value of the result, just
about which constructor was used to create it. If it was
Left
, the result must be a complex
number, otherwise it must be real. We can use a wild card for
the entire second pattern; there's no need to see if the
constructor is Right
, because it
must be; Either only has two
constructors.
We're not limited to using patterns in function
definitions. The case
expression lets us match
patterns at any time. Here's what it looks like.
hasRealRoots a b c = case realRoots a b c of Just _ -> True Nothing -> FalsehasRealRoots a b c = case realRoots a b c of Just _ -> True Nothing -> False
The case
keyword is followed by an arbitrary
expression; the result of this expression is what we're
pattern matching on. The of
keyword signifies
the end of the expression and the beginning of the block of
patterns and expressions.
Each item in the block consists of a pattern, followed by
an arrow ->
, followed by an expression to
evaluate if that pattern matches. The result of the
case
expression is the result of the expression
associated with the first pattern to match, taken from top to
bottom.
To express “here's the expression to evaluate if
none of the other patterns match”, we would just use
the wild card pattern _
as the last in our list
of patterns.
Now that we've seen that we can define a function as a
series of equations, the usefulness of the where
clause should be a bit more clear. Variables that we define
inside a where
clause are visible across all of
the equations that precede it in a single block.
Every variable that appears within a pattern is a new local variable that will be given a value if the pattern matches, not a reference to some variable on “outside” the pattern. Matching a pattern only lets us do exact comparisons against constructors and simple values. We can't use a pattern to match something like a function. To see what we mean, take a look at the following function.
isHead f = case f of head -> True _ -> FalseisHead f = case f of head -> True _ -> False
A naive glance suggests that it's trying to check the
value of f
to see if it's actually the
standard function head
, but such an
interpretation would be quite wrong. Here's what is
really doing.
Because the first pattern in the case
expression is just a variable, this pattern will
always match, no matter what the value of
f
is, and the value of f
will be given to the local variable head
when the right hand side is evaluated.
Finally, because the first pattern always matches, GHC
will always evaluate its right hand side and use that as the
result of the case
expression. The second
pattern (the one with the wild card) will never actually be
reached. Because the two patterns will match the same values,
they are said to overlap. If GHC ever
complains to you about overlapping patterns, it's telling you
that one of your patterns is the same as another, and so will
never actually be matched.
Another limitation of patterns is that a variable can only appear once in a pattern. For example, you can't put a variable in multiple places within a pattern to express the notion “this value and that should be identical”.
The way around these restrictions of Haskell's patterns is to use patterns in combination with a language facility called guards, which we'll talk about next.
We can further extend our expressive arsenal using
guards. A guard is an expression of type
Bool; if it evaluates to True
,
the equation that follows it is evaluated. Otherwise, the next
guard in the series is evaluated, and so on. Here's an example
of guards in action.
guardedRoots a b c | n >= 0 && a /= 0 = Just (r1, r2) | otherwise = Nothing where n = b**2 - 4 * a * c a2 = 2 * a r1 = (-b + sqrt n) / a2 r2 = (-b - sqrt n) / a2guardedRoots a b c | n >= 0 && a /= 0 = Just (r1, r2) | otherwise = Nothing where n = b**2 - 4 * a * c a2 = 2 * a r1 = (-b + sqrt n) / a2 r2 = (-b - sqrt n) / a2
Each guard is introduced by a |
symbol,
followed by the guard expression, then an =
symbol
(or ->
if within a case
expression), then the expression to evaluate if the guard
succeeds.
The otherwise
used in the second guard
has an obvious meaning: it's the expression to evaluate if
previous gaurds all evaluate to False
. It's
not a special piece of syntax, though; it's just a variable
whose value is True
.
We can use guards anywhere that we can use patterns. The combination of writing a function as a series of equations, pattern matching, and guards lets us write code that's clear and easy to understand.
Remember the myTake
function we defined
in the section called “Conditional evaluation”?
myTake :: Int -> [a] -> [a] myTake n xs = if n <= 0 || null xs then xs else myTake (n - 1) (tail xs)myTake :: Int -> [a] -> [a] myTake n xs = if n <= 0 || null xs then xs else myTake (n - 1) (tail xs)
Here's a reformulation of that function using patterns and
guards. Instead of reasoning about what an if
expression is doing and which branch will be evaluated, the code
uses a series of equations with simple patterns and guards. This
makes it easier to understand the behaviour of the function
under different circumstances.
niceTake n _ | n <= 0 = [] niceTake _ [] = [] niceTake n (_:xs) = niceTake (n - 1) xsniceTake n _ | n <= 0 = [] niceTake _ [] = [] niceTake n (_:xs) = niceTake (n - 1) xs
Let's return to one of the limitations of patterns that we mentioned in the previous section: the fact that we can't use a pattern to check two variables within the pattern for equality. We can express this quite easily using a guarded pattern.
secondEqualsThird x = case x of (True, a, b) | a == b -> "foo" | otherwise -> "bar"secondEqualsThird x = case x of (True, a, b) | a == b -> "foo" | otherwise -> "bar"
Here, for good measure, we've illustrated guard syntax in a
case
expression.
Usually, when we define or call a function in Haskell, we write the name of the function, followed by its arguments; this is called prefix notation, because the name of the function comes before its arguments. For a function that takes two arguments, we have the option of using it in infix form, between its first and second arguments. This allows us to write expressions using functions as if they were infix operators.
The syntax for defining or calling a function in infix form is to enclose the name of the function in backtick characters (sometimes known as backquotes). Here's a simple infix definition.
a `plus` b = a + ba `plus` b = a + b
Defining a function in infix form doesn't change anything about the behaviour of the function. We can call the function using infix or prefix notation, as we prefer.
ghci> 1 `plus` 2 3 ghci> plus 1 2 3ghci>
1 `plus` 2
3ghci>
plus 1 2
3
Infix notation is useful for more than just our own
functions. For example, Haskell's standard
Data.List
module defines a function,
isPrefixOf
, that indicates whether all
elements of its first argument are equal to the first elements
of its second argument.
ghci> :module +Data.List ghci> :type isPrefixOf isPrefixOf :: (Eq a) => [a] -> [a] -> Boolghci>
:module +Data.List
ghci>
:type isPrefixOf
isPrefixOf :: (Eq a) => [a] -> [a] -> Bool
Let's define a few variables in ghci.
ghci> let string = "foo" ghci> let other = "foobar"ghci>
let string = "foo"
ghci>
let other = "foobar"
If we call isPrefixOf
using prefix
notation, we can have a hard time remembering which argument
we're checking for as a prefix of the other.
ghci> isPrefixOf string other Trueghci>
isPrefixOf string other
True
But if we use infix notation, the code “reads” more naturally; it's now obvious that we're checking the variable on the left to see if it's a prefix of the variable on the right.
ghci> string `isPrefixOf` other Trueghci>
string `isPrefixOf` other
True
There's no hard-and-fast rule that dictates when you ought to use infix versus prefix notation, although prefix notation is far more common. It's best to choose whichever makes your code more readable in a specific situation.
In this chapter, we've had a whirlwind overview of Haskell's type system and much of its syntax. We've read about basic types, compound types, and how to write our own algebraic data types. We've seen how to write functions, and how to declare local variables within them. We've read about Haskell's offside rule for laying out functions, and how we can avoid it if we need to. We've seen conditional evaluation, pattern matching and guards. We've discussed error handling.
This all amounts to a lot of information to absorb. In Chapter 4, Functional programming, we'll build on this basic knowledge to understand how we can write, and think about, code in Haskell.