Array boundaries and closures in Poke
[03-10-2019]

by Jose E. Marchesi

Poke arrays are rather peculiar. One of their seemingly bizarre characteristics is the fact that the expressions calculating their boundaries (when they are bounded) evaluate in their own lexical environment, which is captured. In other words: the expressions denoting the boundaries of Poke arrays conform closures. Also, the way they evaluate may be surprising. This is no capricious.

There are three different kind of array types in Poke.

Unbounded arrays have no explicit boundaries. Examples are int[] or Elf64_Shdr[]. Arrays can be bounded by number of elements specifying a Poke expression that evaluates to an integer value. For example, int[2]. Finally, arrays can be bounded by size specifying a Poke expression that evaluates to an offset value. For example, int[8#B].

When an array type is bounded, be it by number of elements or by size, the expression indicating the boundary doesn't need to be constant and it can involve variables. For example, consider the following type definition:

var N = 2
type List = int[N*2]

Let's map a List at some offset:

(poke) List @ 0#B
[0x746f6f72,0x303a783a,0x723a303a,0x3a746f6f]

As expected, we get an array of four integers. Very good, obviously the boundary expression N*2 got evaluated when defining the type List, and the result of the evaluation was 4, right?. Typical semantics like in my garden variety programming language... right? Right?!?

Well, not really. Let's modify the value of N and map a List again...

(poke) N = 1
(poke) List @ 0#B
[0x746f6f72,0x303a783a]

Yes, The boundary of the array type changed... come on, this is Poke, was you really expecting something typical? :)

What happens is that at type definition time the lexical environment is captured and a closure is created. The body of the closure is the expression. Every time the type is referred, the closure is re-evaluated and a new value is computed.

Consequently, if the value of a variable referred in the expression changes, like in our example, the type itself gets updated automagically. Very nice but, why is Poke designed like this? Just to impress the cat? Nope.

In binary formats, and also in protocols, the size of some given data is often defined in terms of some other data that should be decoded first. Consider for example the following definition of a Packet:

type Packet =
  struct
  {
    byte size;    
    byte[size] payload;
  };
    

Each packet contains a 8-bit integer specifying the size of the payload transported in the packet. The payload, a sequence of size bytes, follows.

In struct types like the above, the boundaries of arrays depend on fields that have been decoded before and that exist, like variables, in the lexical scope captured by the struct type definition (yes, these are also closures, but that's for another article.) This absolutely depends on having the array types evaluate their bounding expressions when the type is used, and not at type definition time.

To show this property in action, let's play a bit:

(poke) var data = byte[4] @ 0#B
(poke) data[0] = 2
(poke) data[1] = 3
(poke) data[2] = 4
(poke) data[3] = 5
(poke) dump
76543210  0011 2233 4455 6677 8899 aabb ccdd eeff
00000000: 0203 0405 0000 0000 0000 0000 0000 0000
00000010: 0000 0000 0000 0000 0000 0000 0000 0000
(poke) var p1 = Packet @ 0#B
(poke) var p2 = Packet @ 1#B
(poke) p1
Packet {size=0x2UB,payload=[0x3UB,0x4UB]}
(poke) p2
Packet {size=0x3UB,payload=[0x4UB,0x5UB,0x0UB]}

Now, let's change the data and see how the sizes of the payloads are adjusted accordingly:

(poke) data[0] = 1
(poke) data[1] = 0
(poke) p1
Packet {size=0x1UB,payload=[0x0UB]}
(poke) p2
Packet {size=0x0UB,payload=[]}

So, as we have seen, Poke's way of handling boundaries in array types allows data structures to adjust to the particular data they contain, so usual in binary formats. This is an important feature, that gives Poke part of it's feel and magic.

Happy poking! :)

Back to Applied Pokology Follow up in the mailing list...