Applied Pokology
Back to blog...
_____
---' __\_______
______) GNU poke development news
__)
__)
---._______)
Jose E. Marchesi
November 15, 2020
The development of GNU poke is progressing well, and we hold hopes
for a first release before the end of the year: we are determined
for something good to happen in 2020! ;)
This article briefly reviews the latest news in the development of the
program, like changes in certain syntax to make the language more
compact, support for lambda expressions, support for stream-like IO
spaces and how they can be used to write filters, support for using
assignments to poke complex data structures, improvements in data
integrity, annoying bugs fixed, and more.
Make the language a bit more compact
====================================
Being a domain-specific language for a tool, it is to be expected for
Poke to be often written interactively. It follows that compactness
(while maintaining good readability) is important, as it reduces the
number of keys the user must press to achieve whatever effect. In
this spirit, we have recently changed two aspects of the syntax of the
language.
First, we have renamed the keywords `defunit', `defvar', `defun' and
`deftype' to `unit', `var', `fun' and `type' respectively. Therefore,
where we would previously write:
,----
| defun rtrim = (string s, string cs = " \t") string:
| {
| defvar cs_length = cs'length;
| defvar result = "";
| defvar i = s'length;
| ...
| }
`----
Now we write:
,----
| fun rtrim = (string s, string cs = " \t") string:
| {
| var cs_length = cs'length;
| var result = "";
| var i = s'length;
| ...
| }
`----
Second, we have now support for "chaining" several declarations of the
same kind, separated by commas. So, where we would previously write:
,----
| defvar STB_LOCAL = 0;
| defvar STB_GLOBAL = 1;
| defvar STB_WEAK = 2;
| defvar STB_LOOS = 10;
| defvar STB_HIOS = 12;
| defvar STB_LOPROC = 13;
| defvar STB_HIPROC = 15;
`----
Now we write:
,----
| var STB_LOCAL = 0,
| STB_GLOBAL = 1,
| STB_WEAK = 2,
| STB_LOOS = 10,
| STB_HIOS = 12,
| STB_LOPROC = 13,
| STB_HIPROC = 15;
`----
Chaining declarations like that works for units, variables and types,
but not for functions nor methods. This is both due to a technicality
(function specifiers are not terminated with a semicolon) and the fact
it would be quite unusual and confusing.
Support for lambdas
===================
Yes, it was definitely about time... being a proper lexically scoped
language with closures, perfectly capable to do funargs in both
directions (passing them to functions and returning them from
functions) it would be a real indecency to not support lambda
expressions!
So we just added them, and we are much happier now :)
For once the language syntax proved to be sane enough to be on our
side, allowing us to use a nice and orthogonal construct:
,----
| lambda FUNCTION_SPECIFIER
`----
where a FUNCTION_SPECIFIER is the same notation that one would use
when defining a function in a `fun' construction. Examples:
,----
| (poke) lambda void: {}
| #<closure>
| (poke) lambda (int i) int: { return i + 2; }
| #<closure>
`----
Once created, lambdas can be invoked like any other function value:
,----
| (poke) lambda void: {} ()
| (poke) lambda (int i) int: { return i + 2; } (10)
| 12
`----
And of course they can be stored in variables, passed around in
function calls, and the like:
,----
| (poke) var la = lambda (int i) int: { return i + 2; }
| (poke) la (10)
| 12
`----
This is the classic closure-oriented way of supporting generators of
number sequences:
,----
| type Generator = ()int;
| fun new_generator = Generator:
| {
| var i = 0;
| return lambda int: { i = i + 1; return i; };
| }
`----
and then:
,----
| (poke) var g1 = new_generator
| (poke) var g2 = new_generator
| (poke) g1
| 1
| (poke) g1
| 2
| (poke) g2
| 1
| (poke) g1
| 3
`----
Support for stream-like IO spaces
=================================
Back in January, during the first Pokeconf celebrated in Switzerland,
we had a long discussion about how could we support stream-like IO
spaces in poke.
Could that be achieved, it would allow us to access devices like IO
ports and pipes, and poke them at pleasure. Also, it would make it
possible to write filter-like programs in Poke, where the standard
input is processed an entity at a time, and some output generated in
the standard output.
To wrap sequential access devices in a random access abstraction like
the poke IO spaces, without introducing special cases in the handling
of the later, wasn't easy, but we finally figured out a good design
for it. We already published a little post in Applied Pokology
describing it (<http://jemarch.net/pokology-20200113.html>).
Right, but this had to be implemented. Recently our resident IO
expert Egeyar Bagcioglu did just that, adding support for a new kind
of IO device (IOD) to poke: the stream.
The result is awesome. Now we can write filters like this
implementation of the `strings' command:
,----
| #!/usr/local/bin/poke -L
| !#
|
| /* Printable ASCII characters: 0x20..0x7e */
|
| var stdin = open ("<stdin>");
| var stdout = open ("<stdout>");
|
| var offset = 0#B;
|
| try
| {
| flush (stdin, offset);
|
| var b = byte @ stdin : offset;
| if (b >= 0x20 && b <= 0x7e)
| byte @ stdout : iosize (stdout) = b;
|
| offset = offset + 1#B;
| }
| until E_eof;
|
| close (stdin);
| close (stdout);
`----
Note how `open' recognizes the handlers "<stdin>" and "<stdout>" and
uses the stream IOD for them, and how `flush' causes remembered parts
of the stdin to be forgotten.
Thank Ege!
Maps of complex values in l-values
==================================
In Poke it is possible to specify maps in the left side of assignment
statements, like this:
,----
| (poke) int @ 23#B = 666
`----
The above statement will poke the value 666 at offset 23 bytes from
the beginning of the IO space. In principle, values of any type can
be written to the IO space this way, even complex ones like arrays,
structs and unions.
However, until now such constructions were limited to simple types,
i.e. integral, offsets, and strings. Any attempt of poking complex
values like structs were impeded by a compile-time error.
Well... not anymore. We recently (finally!) added support for poking
complex values using maps in the l-value of assignments. This is how
you would create an empty ELF file:
,----
| (poke) .mem tmp
| The current IOS is now `*tmp*'.
| (poke) dump
| 76543210 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789ABCDEF
| 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| (poke) load elf
| (poke) Elf64_Ehdr @ 0#B = Elf64_Ehdr {}
| (poke) dump
| 76543210 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789ABCDEF
| 00000000: 7f45 4c46 0000 0000 0000 0000 0000 0000 .ELF............
| 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| 00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
| (poke) save
| (poke) save :size iosize :file "foo.elf"
`----
It is important to note that when a value is assigned to an l-value
map, its mapped/non-mapped properties, or any other properties, are
not changed at all. For example:
,----
| (poke) var a = [1,2,3]
| (poke) a'mapped
| 0
| (poke) int[3] @ 10#B = a
| (poke) a'mapped
| 0
`----
The attributes of the array stored in the variable `a' don't change.
In this example it is not mapped, but it could have been mapped at
some other (or the same) offset in the current IO space, or even at
another IO space, and its properties would have still been preserved.
Assignment to structs with data integrity
=========================================
Data integrity is a fundamental matter in poke. The several kinds of
constraints specified by the user in type descriptions, like an array
bounded by size, or a struct or union in which fields should have
certain form, shall be preserved at all time.
Now, there are three ways to generate values in Poke:
1) From a literal, like `[1,2,3]'.
2) From a constructor, like `Foo { a = 10, b = 20 }'.
3) From a mapping, like `int[3] @ 0#B' or `Foo @ 0#B'.
In all cases, the constraints are checked, and any violation reported.
This means that the integrity of all the data in poke values, as
defined by the types, is guaranteed by both compile-time and run-time
checks.
However, until now there was a little caveat: it was possible to break
that data integrity when assigning new values to struct fields:
,----
| (poke) type Foo = struct { int a = 0xff; int b; }
| (poke) Foo { }
| Foo {
| a=0xff,
| b=0x0
| }
| (poke) var f = Foo {}
| (poke) f.a
| 0xff
| (poke) f.a = 0
| (poke) f
| Foo {
| a=0x0,
| b=0x0
| }
`----
We just fixed that (support for this required support for other rather
complex stuff) and now poke aborts the assignment if it would break
the integrity of the data:
,----
| (poke) f.a = 20
| unhandled constraint violation exception
`----
Much better this way :)
New rules for union constructors
================================
Unlike structs, union constructors with more than one field
initializer really don't make much sense, like in:
,----
| (poke) type Foo = union { byte b : b > 0; int i; }
| (poke) Foo {b = 2, i = 12}
`----
What kind of Foo are we asking for? It is not clear. Until this
patch, the same rules used for constructed structs applied, so the
result would be:
,----
| Foo {
| b=2UB
| }
`----
Now, if we do:
,----
| (poke) var f = Foo {b = 2}
| (poke) f.b = 0
`----
What would we expect? For the value in `f' to change nature and
become a Foo of kind `i'? Or for this to be considered as a
constraint error?
In the first case (the nature of the union value changes) we could
expect to get something like:
,----
| (poke) f
| Foo {
| i = 12
| }
`----
But should that really be 12, i.e. the value previously specified in
the initializer, or 0? Wouldn't we expect this value to be impacted
by the coupling in bits of the two fields, if we consider both
fields "start" at the beginning of the union?
This quickly degenerates into something very complicated and obscure,
and almost impossible to implement properly and to understand.
However, this is because we are thinking about unions like if they
were regular structs: they are not. This was simply the wrong way of
thinking.
A better approach, which we just implemented, is the following: an
union constructor admits either one or zero field initializers.
Specifying more than one field initializer is a compile-time error:
,----
| (poke) Foo {b = 2, i = 12}
| <stdin>:1:1: error: union constructors require exactly one field initializer
| Foo {b = 2, i = 12};
| ^~~~~~~~~~~~~~~~~~~
`----
When we specify a field initializer, we are also declaring the kind of
Foo we want to construct, i.e. its "nature":
,----
| (poke) Foo {b = 2}
| Foo {
| b=2UB
| }
`----
Therefore, if we provide an invalid initial value for `b', then we get
a constraint-violation exception:
,----
| (poke) Foo {b = 0}
| unhandled constraint violation exception
`----
The alternative `i' is not considered in this case: we asked for a Foo
of kind `b', and we provided the wrong values for it: we want the
exception.
Once constructed, union values do not change their "nature" due to
assignments:
,----
| (poke) var f = Foo {b = 2}
| (poke) f.b = 0
| unhandled constraint violation exception
`----
Note that mapped unions are different in this sense. When we map an
union on some IO space:
,----
| (poke) var m = Foo @ 0#B
`----
we are not specifying what nature of Foo we want: it all depends on
the contents of the IO space. Therefore, we could get either a `b' or
an `i', and if the underlying data in the IO space changes, the nature
of the value in `m' may change as well.
Something similar happens when we don't specify a field initializer in
an union constructor:
,----
| (poke) Foo {}
| Foo {
| i=0
| }
`----
Since no initializer was provided, we didn't indicate what kind of Foo
we wanted, and therefore each alternative is tried assuming all fields
have default values, which in the case of integral types is a zero.
Hope all this makes sense XD
The infamous big array bug is now fixed!
========================================
Ok, this is a bit of an embarrassing one. When I first added support
for arrays and struct values to the Poke virtual machine, I designed
the corresponding `mka' and `mksct' PVM instructions in a way they
would get their contents from the run-time stack.
So, if we wanted to build a struct value with three fields `f1', `f2'
and `f3' we would generate PVM code similar to this pseudo-code (the
actual PVM assembly is more complicated):
,----
| push "f3"
| push 30
| push "f2"
| push 20
| push "f1"
| push 10
| mksct
`----
where `mksct' pops the field names and the field values from the stack
and creates the value `struct {f1 = 10, f2 = 20, f3 = 30}'. Yes, the
stuff should be pushed in reverse order to the stack, for obvious
reasons :)
Of course, being stupid me, I made the mistake of apply the same
strategy to arrays. So for example to build an array [10,20,30], the
compiler would generate PVM code like:
,----
| push 30
| push 20
| push 10
| mka
`----
That works very well when compiling array literals like `[1,2,3]', but
what happens when you map, say, all the bytes in a given tar file that
is, say, 53Mb long?
,----
| (poke) .file some.tar.gz
| (poke) byte[] @ 0#B
| Segmentation fault
`----
Yeah... turns out that the Jitter stacks are not only limited, but not
that big. Which is ok, of course, since we were clearly misusing
them. Also, this approach has the additional disadvantage (promptly
pointed out by Luca Saiu) of requiring copying the elements around
twice for no good reason.
The solution is obvious: the `mka' instruction should work
differently. Instead of getting its elements from the stack, it
should create an empty array. Then the elements should be added to
the array in a sequence or in a loop, using an element insertion
instruction.
Compiling `[1,2,3]' then becomes:
,----
| mka
| push 1
| ains
| push 2
| ains
| push 3
| ains
`----
Whereas the array map should use a loop instead, since the length of
the resulting array is not known at compile-time.
Thing is, I have been aware of both the problem and of its solution
for a long time, but I have been procrastinating it for long, drawing
my attention to more difficult and important issues. This despite
complains of people.
However, while implementing the support for complex maps in l-values,
I realized I needed the elements of array literals to be pushed in the
right order in the stack, and therefore I had no choice but to bit the
bullet and implement the new `mka'.
So now poke supports big arrays without segfaulting, and John
Darrington is happy. Hurrah!
New built-in function gettime
=============================
We added a new built-in function `gettime', that returns an array of
two signed 64-bit numbers denoting the number of seconds and
nanoseconds since the Epoch (1-1-1970) respectively.
Additionally, we added a `Timespec' struct and an accompanying
`gettimeofdaty' function to the `time' pickle:
,----
| type Timespec = struct
| {
| int<64> sec;
| int<64> nsec;
| };
|
| fun gettimeofday = Timespec:
| {
| var time = get_time;
| return Timespec {sec = time[0], nsec = time[1]};
| }
`----
Support for octal and hexadecimal codes in strings
==================================================
The Poke strings are slowly maturing into grown-up strings... this
week Mohammad-Reza Nabipoor (whom we warmly welcome to the poke gang!)
added support for specifying character codes in strings using octal
and hexadecimal escapes:
,----
| (poke) print "foo\xa"
| foo
| (poke) print "fo\157\n"
| foo
`----
Thank you Mohammad!
Support for `continue' in loops
===============================
Not much to say about this one... we have added `continue' statements
to the language, with the usual semantics: it initiates a new
iteration in the containing loop:
,----
| for (packet in packets)
| {
| if (packet_is_not_valid (packet))
| continue;
| ...
| }
`----
poke.rec database
=================
Lastly... the boring stuff :)
GNU poke is getting big and complex, encompassing many components,
and happily more people are joining the development. We are really
starting to be in need of organizing better, or we will go nuts.
Therefore, in what proved to be a quite painful exercise, I gathered
all my dispersed notes, TODO lists, ideas and bug reports,
prioritized them, and documented and organized them in a recutils
database (http://www.gnu.org/s/recutils). The database can be found
in the file `etc/poke.rec' in the source tree.
This database currently contains record sets for tasks, releases and
hackers. It provides a way to generate reports, and clear answers
to questions like "what is pending before we can release 1.0?" or,
"I would like to work on the compiler, what can I do?".
The fact that `rec-mode.el', the Emacs interface to recutils, is now
very actively maintained by Antoine Kalmbach, really helps us. We
have got to send him a Jamón de Bellota as a proof of our
estimation!
And that's all for now. Happy poking! :)