[Progress on the basics, and our first appendix. Bryan O'Sullivan **20070702063740] { hunk ./en/00book.xml 8 + + hunk ./en/00book.xml 18 + + + + hunk ./en/00book.xml 97 + + &appA; + hunk ./en/Makefile 8 - $(wildcard ch*.xml) + $(wildcard ch*.xml) \ + $(wildcard app*.xml) hunk ./en/Makefile 16 + $(wildcard ../examples/appA/*.ghci) \ addfile ./en/appA-escapes.xml hunk ./en/appA-escapes.xml 1 + + Characters, strings, and escaping rules + + This appendix covers the escaping rules used to represent + non-ASCII characters in Haskell character and string literals. + Haskell's escaping rules follow the pattern established by the C + programming language, but expand considerably upon them. + + + Writing character and string literals + + A single character is surrounded by ASCII single quotes, + ', and has type Char. + + &text.ghci:char; + + A string literal is surrounded by double quotes, + ", and has type [Char] (more + often written as String). + + &text.ghci:string; + + The double-quoted form of a string literal is just syntactic + sugar for list notation. + + &text.ghci:stringlist; + + + + + International language support + + Haskell uses Unicode internally for its Char + data type. Since String is just an alias for + [Char], a list of Chars, Unicode is + also used to represent strings. + + Different Haskell implementations place limitations on the + character sets they can accept in source files. &GHC; allows + source files to be written in the UTF-8 encoding of Unicode, so + in a source file, you can use UTF-8 literals inside a character + or string constant. Do be aware that if you use UTF-8, other + Haskell implementations may not be able to parse your source + files. + + When you run the &ghci; interpreter interactively, it may + not be able to deal with international characters in character + or string literals that you enter at the keyboard. + + + Although Haskell represents characters and strings + internally using Unicode, there is no standardised way to do + I/O on files that contain Unicode data. Haskell's standard + text I/O functions treat text as a sequence of 8-bit + characters, and do not perform any character set + conversion. + + There exist third-party libraries that will convert + between the many different encodings used in files and + Haskell's internal Unicode representation. + + + + + + Escaping text + + Some characters must be escaped to be represented inside a + character or string literal. For example, a double quote + character inside a string literal must be escaped, or else it + will be treated as the end of the string. + + + Single-character escape codes + + Haskell uses essentially the same single-character escapes + as the C language and many other popular languages. + + + Single-character escape codes + + + + Escape + Value + Character + + + + + \0 + U+0000 + null character + + + \a + U+0007 + alert + + + \b + + + U+0008 + backspace + + + \f + U+000C + form feed + + + \n + U+000A + newline (line feed) + + + \r + U+000D + carriage return + + + \t + U+0009 + horizontal tab + + + \v + U+000B + vertical tab + + + \" + U+0022 + double quote + + + \& + n/a + empty string + + + \' + U+0027 + single quote + + + \\ + U+005C + backslash + + + +
+ +
+ + + Multiline string literals + + To write a string literal that spans multiple lines, + terminate one line with a backslash, and resume the string + with another backslash. An arbitrary amount of whitespace (of + any kind) can fill the gap between the two backslashes. + + + + + + ASCII control codes + + Haskell recognises the escaped use of the standard two- + and three-letter abbreviations of ASCII control codes. + + + ASCII control code abbreviations + + + + Escape + Unicode + Meaning + + + + + \NUL + U+0000 + null character + + + \SOH + U+0001 + start of heading + + + \STX + U+0002 + start of text + + + \ETX + U+0003 + end of text + + + \EOT + U+0004 + end of transmission + + + \ENQ + U+0005 + enquiry + + + \ACK + U+0006 + acknowledge + + + \BEL + U+0007 + bell + + + \BS + U+0008 + backspace + + + \HT + U+0009 + horizontal tab + + + \LF + U+000A + line feed (newline) + + + \VT + U+000B + vertical tab + + + \FF + U+000C + form feed + + + \CR + U+000D + carriage return + + + \SO + U+000E + shift out + + + \SI + U+000F + shift in + + + \DLE + U+0010 + data link escape + + + \DC1 + U+0011 + device control 1 + + + \DC2 + U+0012 + device control 2 + + + \DC3 + U+0013 + device control 3 + + + \DC4 + U+0014 + device control 4 + + + \NAK + U+0015 + negative acknowledge + + + \SYN + U+0016 + synchronous idle + + + \ETB + U+0017 + end of transmission block + + + \CAN + U+0018 + cancel + + + \EM + U+0019 + end of medium + + + \SUB + U+001A + substitute + + + \ESC + U+001B + escape + + + \FS + U+001C + file separator + + + \GS + U+001D + group separator + + + \RS + U+001E + record separator + + + \US + U+001F + unit separator + + + \SP + U+0020 + space + + + \DEL + U+007F + delete + + + +
+
+ + + Control-with-character escapes + + Haskell recognises an alternate notation for control + characters, which represents the archaic effect of pressing + the control key on a + keyboard and chording it with another key. These sequences + begin with the characters \^, followed by a + symbol or uppercase letter. + + + Control-with-character escapes + + + + Escape + Unicode + Meaning + + + + + \^@ + U+0000 + null character + + + \^A through \^Z + U+0001 through U+001A + control codes + + + \^[ + U+001B + escape + + + \^\ + U+001C + file separator + + + \^] + U+001D + group separator + + + \^^ + U+001E + record separator + + + \^_ + U+001F + unit separator + + + +
+
+ + + Numeric escapes + + Haskell allows Unicode characters to be written using + numeric escapes. A decimal character begins with a digit, + e.g. \1234. A hexadecimal character begins + with an x, e.g. \xbeef. + An octal character begins with an o, + e.g. \o1234. + + The maximum value of a numeric literal is + \1114111, which may also be written + \x10ffff or + \o4177777. + + + + The zero-width escape sequence + + String literals can contain a zero-width escape sequence, + written \&. This is not a real + character, as it represents the empty string. + + &text.ghci:empty; + + The purpose of this escape sequence is to make it possible + to write a numeric escape followed immediately by a regular + ASCII digit. + + &text.ghci:empty.example; + + Because the empty escape sequence represents an empty + string, it is not legal in a character literal. + +
+
+ + hunk ./en/ch02-starting.xml 202 + + Command line editing + + On most systems, &ghci; has some amount of command line + editing ability. On Unix-like systems, it uses the GNU + readline library, which is powerful and customisable. On + Windows, &ghci;'s command line editing capabilities are + provided by the doskey command. + + If you haven't used command line editing before, it's a + huge time saver. The basics are common to both Unix-like and + Windows systems. Pressing the up arrow key on your keyboard recalls + the last line of input you entered; pressing up repeatedly cycles through earlier + lines of input. You can use the left and right arrow keys to move around + inside a line of input. + + Just knowing this much will save you a lot of repeated + typing. If you want to learn more about command line editing + on your system, consult the readline or + doskey documentation. + + hunk ./en/ch02-starting.xml 356 + + Comparison and boolean operators + + Haskell gives us the usual operators for working with + boolean values. + + &basics.ghci:boolean; + + Unlike some other languages, Haskell does not treat the + number zero as synonymous with False, nor + does it accept non-zero as True. + + &basics.ghci:boolean.bad; + + Comparison operators are mostly familiar from other + languages. + + &basics.ghci:comparison; + + There's one exception: the is not equal + operator is (/=). + + &basics.ghci:neq; + + + hunk ./en/ch02-starting.xml 433 - hunk ./en/ch02-starting.xml 534 + + + + Text, strings and lists + + If you are familiar with a language like Python, you'll find + Haskell's notations for strings, lists and tuples + familiar. + + A text string is surrounded by double quotes. + + &basics.ghci:string; + + As in many languages, we can represent hard-to-print + characters by escaping them. Haskell's escape + characters and escaping rules expand on the conventions + established by the C language (for details, see + ). + + &basics.ghci:newline; + + A list is surrounded by square brackets, with elements + separated by commas. + + &basics.ghci:list; + + A list can be any length. + + &basics.ghci:list.shortlong; + + All elements of a list must have the same type. + + &basics.ghci:list.bad; + + Once again, &ghci;'s error message is verbose, but it's + simply telling us that it can't figure out how to turn the + string into a number, so the list isn't properly typed. + + When it makes sense to do so, we can write a range of + elements, and Haskell will fill in the contents of the list for + us. + + &basics.ghci:range; + + We can specify the size of the step to use by giving the + first two elements, followed by the value at which to stop + generating the range. + + &basics.ghci:range.step; + + Haskell makes a distinction between single characters and + text strings. A single character is enclosed in single + quotes. + + &basics.ghci:char; + + In fact, a text string is simply a list of individual + characters. Here's a painful way to write a short + string. + + &basics.ghci:work; + hunk ./examples/ch02/basics.ghci 45 +--# boolean + +True +False +True && False +False || True + +--# boolean.bad + +True && 1 + +--# comparison + +1 == 1 +2 < 3 +4 >= 3.99 + +--# neq + +2 /= 3 + hunk ./examples/ch02/basics.ghci 119 +--# string + +"This is a string." + +--# newline + +"Here's a newline -->\n<-- See?" + +--# list + +[1, 2, 3] + +--# list.shortlong + +[] +["foo", "bar", "baz", "quux", "fnord", "xyzzy"] + +--# list.bad + +[1, 2, "buckle my shoe"] + +--# range + +[1..10] + +--# range.step + +[1.0,1.25..2.0] +[1,4..15] + +--# char + +'a' + +--# range.char + +['a','e'..'z'] + +--# work + +let a = ['l', 'o', 't', 's', ' ', 'o', 'f', ' ', 'w', 'o', 'r', 'k'] +a +a == "lots of work" + }