Applied Pokology - GNU poke development news

 
Applied Pokology                                           Back to blog...

     _____
 ---'   __\_______
            ______)         GNU poke development news
            __)             
           __)
 ---._______)

                                                          Jose E. Marchesi
                                                          November 15, 2020


  The development of GNU poke is progressing well, and we hold hopes
  for a first release before the end of the year: we are determined
  for something good to happen in 2020! ;)

  This article briefly reviews the latest news in the development of the
  program, like changes in certain syntax to make the language more
  compact, support for lambda expressions, support for stream-like IO
  spaces and how they can be used to write filters, support for using
  assignments to poke complex data structures, improvements in data
  integrity, annoying bugs fixed, and more.

Make the language a bit more compact
====================================

  Being a domain-specific language for a tool, it is to be expected for
  Poke to be often written interactively.  It follows that compactness
  (while maintaining good readability) is important, as it reduces the
  number of keys the user must press to achieve whatever effect.  In
  this spirit, we have recently changed two aspects of the syntax of the
  language.

  First, we have renamed the keywords `defunit', `defvar', `defun' and
  `deftype' to `unit', `var', `fun' and `type' respectively.  Therefore,
  where we would previously write:

  ,----
  | defun rtrim = (string s, string cs = " \t") string:
  | {
  |   defvar cs_length = cs'length;
  |   defvar result = "";
  |   defvar i = s'length;
  |   ...
  | }
  `----


  Now we write:

  ,----
  |  fun rtrim = (string s, string cs = " \t") string:
  |  {
  |   var cs_length = cs'length;
  |   var result = "";
  |   var i = s'length;
  |   ...
  | }
  `----


  Second, we have now support for "chaining" several declarations of the
  same kind, separated by commas.  So, where we would previously write:

  ,----
  | defvar STB_LOCAL = 0;
  | defvar STB_GLOBAL = 1;
  | defvar STB_WEAK = 2;
  | defvar STB_LOOS = 10;
  | defvar STB_HIOS = 12;
  | defvar STB_LOPROC = 13;
  | defvar STB_HIPROC = 15;
  `----


  Now we write:

  ,----
  | var STB_LOCAL = 0,
  |     STB_GLOBAL = 1,
  |     STB_WEAK = 2,
  |     STB_LOOS = 10,
  |     STB_HIOS = 12,
  |     STB_LOPROC = 13,
  |     STB_HIPROC = 15;
  `----


  Chaining declarations like that works for units, variables and types,
  but not for functions nor methods.  This is both due to a technicality
  (function specifiers are not terminated with a semicolon) and the fact
  it would be quite unusual and confusing.


Support for lambdas
===================

  Yes, it was definitely about time... being a proper lexically scoped
  language with closures, perfectly capable to do funargs in both
  directions (passing them to functions and returning them from
  functions) it would be a real indecency to not support lambda
  expressions!

  So we just added them, and we are much happier now :)

  For once the language syntax proved to be sane enough to be on our
  side, allowing us to use a nice and orthogonal construct:

  ,----
  | lambda FUNCTION_SPECIFIER
  `----


  where a FUNCTION_SPECIFIER is the same notation that one would use
  when defining a function in a `fun' construction.  Examples:

  ,----
  | (poke) lambda void: {}
  | #<closure>
  | (poke) lambda (int i) int: { return i + 2; }
  | #<closure>
  `----


  Once created, lambdas can be invoked like any other function value:

  ,----
  | (poke) lambda void: {} ()
  | (poke) lambda (int i) int: { return i + 2; } (10)
  | 12
  `----


  And of course they can be stored in variables, passed around in
  function calls, and the like:

  ,----
  | (poke) var la = lambda (int i) int: { return i + 2; }
  | (poke) la (10)
  | 12
  `----


  This is the classic closure-oriented way of supporting generators of
  number sequences:

  ,----
  | type Generator = ()int;
  | fun new_generator = Generator:
  | {
  |   var i = 0;
  |   return lambda int: { i = i + 1; return i; };
  | }
  `----


  and then:

  ,----
  | (poke) var g1 = new_generator
  | (poke) var g2 = new_generator
  | (poke) g1
  | 1
  | (poke) g1
  | 2
  | (poke) g2
  | 1
  | (poke) g1
  | 3
  `----


Support for stream-like IO spaces
=================================

  Back in January, during the first Pokeconf celebrated in Switzerland,
  we had a long discussion about how could we support stream-like IO
  spaces in poke.

  Could that be achieved, it would allow us to access devices like IO
  ports and pipes, and poke them at pleasure.  Also, it would make it
  possible to write filter-like programs in Poke, where the standard
  input is processed an entity at a time, and some output generated in
  the standard output.

  To wrap sequential access devices in a random access abstraction like
  the poke IO spaces, without introducing special cases in the handling
  of the later, wasn't easy, but we finally figured out a good design
  for it.  We already published a little post in Applied Pokology
  describing it (<http://jemarch.net/pokology-20200113.html>).

  Right, but this had to be implemented.  Recently our resident IO
  expert Egeyar Bagcioglu did just that, adding support for a new kind
  of IO device (IOD) to poke: the stream.

  The result is awesome.  Now we can write filters like this
  implementation of the `strings' command:

  ,----
  | #!/usr/local/bin/poke -L
  | !#
  | 
  | /* Printable ASCII characters: 0x20..0x7e */
  | 
  | var stdin = open ("<stdin>");
  | var stdout = open ("<stdout>");
  | 
  | var offset = 0#B;
  | 
  | try
  | {
  |   flush (stdin, offset);
  |   
  |   var b = byte @ stdin : offset;
  |   if (b >= 0x20 && b <= 0x7e)
  |     byte @ stdout : iosize (stdout) = b;
  |   
  |   offset = offset + 1#B;
  | }
  | until E_eof;
  | 
  | close (stdin);
  | close (stdout);
  `----


  Note how `open' recognizes the handlers "<stdin>" and "<stdout>" and
  uses the stream IOD for them, and how `flush' causes remembered parts
  of the stdin to be forgotten.

  Thank Ege!


Maps of complex values in l-values
==================================

  In Poke it is possible to specify maps in the left side of assignment
  statements, like this:

  ,----
  | (poke) int @ 23#B = 666
  `----


  The above statement will poke the value 666 at offset 23 bytes from
  the beginning of the IO space.  In principle, values of any type can
  be written to the IO space this way, even complex ones like arrays,
  structs and unions.

  However, until now such constructions were limited to simple types,
  i.e. integral, offsets, and strings.  Any attempt of poking complex
  values like structs were impeded by a compile-time error.

  Well... not anymore.  We recently (finally!) added support for poking
  complex values using maps in the l-value of assignments.  This is how
  you would create an empty ELF file:

  ,----
  | (poke) .mem tmp
  | The current IOS is now `*tmp*'.
  | (poke) dump
  | 76543210  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789ABCDEF
  | 00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | (poke) load elf
  | (poke) Elf64_Ehdr @ 0#B = Elf64_Ehdr {}
  | (poke) dump
  | 76543210  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789ABCDEF
  | 00000000: 7f45 4c46 0000 0000 0000 0000 0000 0000  .ELF............
  | 00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | 00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  | (poke) save
  | (poke) save :size iosize :file "foo.elf"
  `----


  It is important to note that when a value is assigned to an l-value
  map, its mapped/non-mapped properties, or any other properties, are
  not changed at all.  For example:

  ,----
  | (poke) var a = [1,2,3]
  | (poke) a'mapped
  | 0
  | (poke) int[3] @ 10#B = a
  | (poke) a'mapped
  | 0
  `----


  The attributes of the array stored in the variable `a' don't change.
  In this example it is not mapped, but it could have been mapped at
  some other (or the same) offset in the current IO space, or even at
  another IO space, and its properties would have still been preserved.


Assignment to structs with data integrity
=========================================

  Data integrity is a fundamental matter in poke.  The several kinds of
  constraints specified by the user in type descriptions, like an array
  bounded by size, or a struct or union in which fields should have
  certain form, shall be preserved at all time.

  Now, there are three ways to generate values in Poke:

  1) From a literal, like `[1,2,3]'.
  2) From a constructor, like `Foo { a = 10, b = 20 }'.
  3) From a mapping, like `int[3] @ 0#B' or `Foo @ 0#B'.

  In all cases, the constraints are checked, and any violation reported.
  This means that the integrity of all the data in poke values, as
  defined by the types, is guaranteed by both compile-time and run-time
  checks.

  However, until now there was a little caveat: it was possible to break
  that data integrity when assigning new values to struct fields:

  ,----
  | (poke) type Foo = struct { int a = 0xff; int b; }
  | (poke) Foo { }
  | Foo {
  |   a=0xff,
  |   b=0x0
  | }
  | (poke) var f = Foo {}
  | (poke) f.a
  | 0xff
  | (poke) f.a = 0
  | (poke) f
  | Foo {
  |   a=0x0,
  |   b=0x0
  | }
  `----


  We just fixed that (support for this required support for other rather
  complex stuff) and now poke aborts the assignment if it would break
  the integrity of the data:

  ,----
  | (poke) f.a = 20
  | unhandled constraint violation exception
  `----


  Much better this way :)


New rules for union constructors
================================

  Unlike structs, union constructors with more than one field
  initializer really don't make much sense, like in:

  ,----
  | (poke) type Foo = union { byte b : b > 0; int i; }
  | (poke) Foo {b = 2, i = 12}
  `----


  What kind of Foo are we asking for?  It is not clear.  Until this
  patch, the same rules used for constructed structs applied, so the
  result would be:

  ,----
  | Foo {
  |   b=2UB
  | }
  `----


  Now, if we do:

  ,----
  | (poke) var f = Foo {b = 2}
  | (poke) f.b = 0
  `----


  What would we expect?  For the value in `f' to change nature and
  become a Foo of kind `i'?  Or for this to be considered as a
  constraint error?

  In the first case (the nature of the union value changes) we could
  expect to get something like:

  ,----
  | (poke) f
  | Foo {
  | i = 12
  | }
  `----


  But should that really be 12, i.e. the value previously specified in
  the initializer, or 0?  Wouldn't we expect this value to be impacted
  by the coupling in bits of the two fields, if we consider both
  fields "start" at the beginning of the union?

  This quickly degenerates into something very complicated and obscure,
  and almost impossible to implement properly and to understand.
  However, this is because we are thinking about unions like if they
  were regular structs: they are not.  This was simply the wrong way of
  thinking.

  A better approach, which we just implemented, is the following: an
  union constructor admits either one or zero field initializers.
  Specifying more than one field initializer is a compile-time error:

  ,----
  | (poke) Foo {b = 2, i = 12}
  | <stdin>:1:1: error: union constructors require exactly one field initializer
  | Foo {b = 2, i = 12};
  | ^~~~~~~~~~~~~~~~~~~
  `----


  When we specify a field initializer, we are also declaring the kind of
  Foo we want to construct, i.e. its "nature":

  ,----
  | (poke) Foo {b = 2}
  | Foo {
  |   b=2UB
  | }
  `----


  Therefore, if we provide an invalid initial value for `b', then we get
  a constraint-violation exception:

  ,----
  | (poke) Foo {b = 0}
  | unhandled constraint violation exception
  `----


  The alternative `i' is not considered in this case: we asked for a Foo
  of kind `b', and we provided the wrong values for it: we want the
  exception.

  Once constructed, union values do not change their "nature" due to
  assignments:

  ,----
  | (poke) var f = Foo {b = 2}
  | (poke) f.b = 0
  | unhandled constraint violation exception
  `----


  Note that mapped unions are different in this sense.  When we map an
  union on some IO space:

  ,----
  | (poke) var m = Foo @ 0#B
  `----


  we are not specifying what nature of Foo we want: it all depends on
  the contents of the IO space.  Therefore, we could get either a `b' or
  an `i', and if the underlying data in the IO space changes, the nature
  of the value in `m' may change as well.

  Something similar happens when we don't specify a field initializer in
  an union constructor:

  ,----
  | (poke) Foo {}
  | Foo {
  |   i=0
  | }
  `----


  Since no initializer was provided, we didn't indicate what kind of Foo
  we wanted, and therefore each alternative is tried assuming all fields
  have default values, which in the case of integral types is a zero.

  Hope all this makes sense XD


The infamous big array bug is now fixed!
========================================

  Ok, this is a bit of an embarrassing one.  When I first added support
  for arrays and struct values to the Poke virtual machine, I designed
  the corresponding `mka' and `mksct' PVM instructions in a way they
  would get their contents from the run-time stack.

  So, if we wanted to build a struct value with three fields `f1', `f2'
  and `f3' we would generate PVM code similar to this pseudo-code (the
  actual PVM assembly is more complicated):

  ,----
  | push "f3"
  | push 30
  | push "f2"
  | push 20
  | push "f1"
  | push 10
  | mksct
  `----


  where `mksct' pops the field names and the field values from the stack
  and creates the value `struct {f1 = 10, f2 = 20, f3 = 30}'.  Yes, the
  stuff should be pushed in reverse order to the stack, for obvious
  reasons :)

  Of course, being stupid me, I made the mistake of apply the same
  strategy to arrays.  So for example to build an array [10,20,30], the
  compiler would generate PVM code like:

  ,----
  | push 30
  | push 20
  | push 10
  | mka
  `----


  That works very well when compiling array literals like `[1,2,3]', but
  what happens when you map, say, all the bytes in a given tar file that
  is, say, 53Mb long?

  ,----
  | (poke) .file some.tar.gz
  | (poke) byte[] @ 0#B
  | Segmentation fault
  `----


  Yeah... turns out that the Jitter stacks are not only limited, but not
  that big.  Which is ok, of course, since we were clearly misusing
  them.  Also, this approach has the additional disadvantage (promptly
  pointed out by Luca Saiu) of requiring copying the elements around
  twice for no good reason.

  The solution is obvious: the `mka' instruction should work
  differently.  Instead of getting its elements from the stack, it
  should create an empty array.  Then the elements should be added to
  the array in a sequence or in a loop, using an element insertion
  instruction.

  Compiling `[1,2,3]' then becomes:

  ,----
  | mka
  | push 1
  | ains
  | push 2
  | ains
  | push 3
  | ains
  `----


  Whereas the array map should use a loop instead, since the length of
  the resulting array is not known at compile-time.

  Thing is, I have been aware of both the problem and of its solution
  for a long time, but I have been procrastinating it for long, drawing
  my attention to more difficult and important issues.  This despite
  complains of people.

  However, while implementing the support for complex maps in l-values,
  I realized I needed the elements of array literals to be pushed in the
  right order in the stack, and therefore I had no choice but to bit the
  bullet and implement the new `mka'.

  So now poke supports big arrays without segfaulting, and John
  Darrington is happy.  Hurrah!


New built-in function gettime
=============================

  We added a new built-in function `gettime', that returns an array of
  two signed 64-bit numbers denoting the number of seconds and
  nanoseconds since the Epoch (1-1-1970) respectively.

  Additionally, we added a `Timespec' struct and an accompanying
  `gettimeofdaty' function to the `time' pickle:

  ,----
  | type Timespec = struct
  | {
  |   int<64> sec;
  |   int<64> nsec;
  | };
  | 
  | fun gettimeofday = Timespec:
  | {
  |   var time = get_time;
  |   return Timespec {sec = time[0], nsec = time[1]};
  | }
  `----


Support for octal and hexadecimal codes in strings
==================================================

  The Poke strings are slowly maturing into grown-up strings... this
  week Mohammad-Reza Nabipoor (whom we warmly welcome to the poke gang!)
  added support for specifying character codes in strings using octal
  and hexadecimal escapes:

  ,----
  | (poke) print "foo\xa"
  | foo
  | (poke) print "fo\157\n"
  | foo
  `----


  Thank you Mohammad!


Support for `continue' in loops
===============================

  Not much to say about this one... we have added `continue' statements
  to the language, with the usual semantics: it initiates a new
  iteration in the containing loop:

  ,----
  | for (packet in packets)
  | {
  |   if (packet_is_not_valid (packet))
  |     continue;
  |   ...
  | }
  `----


poke.rec database
=================

  Lastly... the boring stuff :)


  GNU poke is getting big and complex, encompassing many components,
  and happily more people are joining the development.  We are really
  starting to be in need of organizing better, or we will go nuts.

  Therefore, in what proved to be a quite painful exercise, I gathered
  all my dispersed notes, TODO lists, ideas and bug reports,
  prioritized them, and documented and organized them in a recutils
  database (http://www.gnu.org/s/recutils).  The database can be found
  in the file `etc/poke.rec' in the source tree.

  This database currently contains record sets for tasks, releases and
  hackers.  It provides a way to generate reports, and clear answers
  to questions like "what is pending before we can release 1.0?"  or,
  "I would like to work on the compiler, what can I do?".

  The fact that `rec-mode.el', the Emacs interface to recutils, is now
  very actively maintained by Antoine Kalmbach, really helps us.  We
  have got to send him a Jamón de Bellota as a proof of our
  estimation!

  And that's all for now.  Happy poking! :)