Next: ASCII Strings, Previous: Character Sets, Up: Basic Editing [Contents][Index]
poke has built-in support for ASCII, and its simple encoding: each ASCII code is encoded using one byte. Let’s see:
(poke) 'a' 0x61UB
We presented poke with the character a
, and it answered with
its corresponding code in the ASCII character set, which is 0x61. In
fact, ’a’ and 0x61UB are just two ways to write exactly the same byte
value in poke:
(poke) 'a' == 0x61UB 1 (poke) 'a' + 1 0x62U
In order to make this more explicit, poke provides yet another synonym
for the type specifier uint<8>
: char
.
When working with characters it is very useful to have some acquaintance of the ASCII character set, which is designed in a very clever way with the goal of easing certain code calculations. See Table of ASCII Codes for a table of ASCII codes in different numeration bases.
Consider for example the problem of determining whether a byte we map from an IO space is a digit. Looking at the ASCI table, we observe that digits all have consecutive codes, so we can do:
(poke) var b = byte @ 23#B (poke) b >= '0' && b <= '9' 1
Now that we know that b
is a digit, how could we calculate its
digit value? If we look at the ASCII table again, we will find that
the character codes for digits are not only consecutive: they are also
ordered in ascending order 0
, 1
, … Therefore, we can
do:
(poke) b 0x37UB (poke) b - '0' 7UB
b
contains the ASCII code 0x37UB, which corresponds to the
character 7
, which is a digit.
How would we check whether b
is a letter? Looking at the ASCII
table, we find that lower-case letters are encoded consecutively, and
the same applies to upper-case letters. This leads to repeat the
trick again:
(poke) (b >= 'a' && b <= 'z') || (b >= 'A' && b <= 'Z') 0
Not all ASCII code are printable using the glyph that are usually supported in terminals. If you look at the table in Table of ASCII Codes, you will find codes for characters described as “start of text”, “vertical tab”, and so on.
These character codes, which are commonly known as non-printable characters, can be represented in poke using its octal code:
(poke) '\002' 0x2UB
This is of course no different than using 2UB
directly, but in
some contexts the “character like” notation may be desirable, to
stress the fact that the byte value is used as an ASCII character.
Some of the non-printable characters also have alternative notations. This includes new-line and horizontal tab:
(poke) '\n' 0xaUB (poke) '\t' 0x9UB
These \
constructions in character literals are called
escape sequences. See Characters for a complete list of
allowed escapes in character literals.
Next: ASCII Strings, Previous: Character Sets, Up: Basic Editing [Contents][Index]