Applied Pokology

Back to blog... _____ ---' __\_______ ______)

Understanding Poke methods

__) __) ---._______) Jose E. Marchesi May 4, 2020 Poke struct types can be a bit daunting at first sight. You can find all sort of things inside them: from fields, variables and functions to constraint expressions, initialization expressions, labels, other type definitions, and methods. Struct methods can be particularly confusing for the novice poker. In particular, it is important to understand the difference between methods and regular functions defined inside struct types. This article will hopefully clear the confusion, and also will provide the reader with a better understanding on how poke works internally. The Packet ========== First we need to define some structure to use as an example. Let's say we are interesting in poking Packets, as defined by the Packet Specification 1.2 published by the Packet Foundation (none less). In a nutshell, each Packet starts with a byte whose value is always 0xab, followed by a byte that defines the size of the payload. A stream of bytes conforming the payload follows, themselves followed by another stream of the same number of bytes with "control" values. We could translate this description into the following Poke struct type definition: ,---- | type Packet = | struct | { | byte magic = 0xab; | byte size; | byte[size] payload; | byte[size] control; | }; `---- See the Poke manual for details on types, initialization values, constraint expressions etc. There are some details described the Packet Specification 1.2 that are not covered in this simple definition, but we will be attending to that later in this article. The process of building structs =============================== Given the definition of a struct type like Packet, there are only two ways to build a struct value in Poke. One is to map it from some IO space. This is achieved using the map operator: ,---- | (poke) Packet @ 12#B | Packet { | magic = 0xab, | size = 2, | payload = [0x12UB,0x30UB], | control = [0x1UB,0x1UB] | } `---- The expression above maps a Packet starting at offset 12 bytes, in the current IO space. See the Poke manual for more details on using the map operator. The second way to build a struct value is to _construct_ one, specifying the value to some, all or none of its fields. It looks like this: ,---- | (poke) Packet {size = 2, payload = [1UB,2UB]} | Packet { | magic = 0xab, | size = 2, | payload = [0x1UB,0x2UB], | control = [0x0UB,0x0UB] | } `---- In either case, building a struct involves to determine the value of all the fields of the struct, one by one. The order in which the struct fields are built is determined by the order of appearance of the fields in the type description. In our example, the value of magic is determined first, then `size', `payload' and finally `control'. This is the reason why we can refer to the values of previous fields when defining fields, such as in the size of the `payload' array above, but not the other way around: by the time `payload' is mapped or constructed, the value of `size', has already been mapped or constructed. What happens behind the curtains is that when poke finds the definition of a struct type, like Packet, it compiles two functions from it: a mapper function, and a constructor function. The mapper function gets as arguments the IO space and the offset from which to map the struct value, whereas the constructor function gets the template specifying the initial values for some, or all of the fields; reasonable default values (like zeroes) are used for fields for which no initial values have been specified. These functions, mapper and constructor, are invoked to create fresh values when a map operator @ or a struct constructor is used in a Poke program, or at the poke prompt. Variables in struct types ========================= Fields are not the only entity that can appear in the definition of a struct type. Suppose that after reading more carefully the Packet Specification 1.2 (that spans for several thousand of pages) we realize that the field `size' doesn't really stores the number of bytes of the payload and control arrays, like we thought initially. Or not exactly: the Packet Foundation says that if `size' has the special value 0xff, then the size is zero. We could of course do something like this: ,---- | type Packet = | struct | { | byte magic = 0xab; | byte size; | | byte[size == 0xff ? 0 : size] payload; | byte[size == 0xff ? 0 : size] control; | }; `---- However, we can avoid replicating code by using a variable instead: ,---- | type Packet = | struct | { | byte magic = 0xab; | byte size; | | var real_size = (size == 0xff ? 0 : size); | | byte[real_size] payload; | byte[real_size] control; | }; `---- Note how the variable can be used after it gets defined. In the underlying process of mapping or constructing the struct, the variable is incorporated into the lexical environment. Once defined, it can be used in constraint expressions, array sizes, etc. We will see more about this later. Incidentally, it is of course possible to use global variables as well. For example: ,---- | var packet_special = 0xff; | type Packet = | struct | { | byte magic = 0xab; | byte size; | | var real_size = (size == packet_special ? 0 : size); | | byte[real_size] payload; | byte[real_size] control; | }; `---- In this case, the global `packet_special' gets captured in the lexical environment of the struct type (in reality in the lexical environment of the implicitly created mapper and constructor functions) in a way that if you later modify `packet_special' the new value will be used when mapping/constructing _new_ values of type Packet. Which is really cool, but lets not get distracted from the main topic... :) Functions in struct types ========================= Further reading of the Packet Specification 1.2 reveals that each Packet has an additional `crc' field. The content of this field is derived from both the payload bytes and the control bytes. But this is no vulgar CRC we are talking about. On the contrary, it is a special function developed by the CRC Foundation in partnership with the Packet Foundation, called superCRC (patented, TM). Fortunately, the CRC Foundation distributes a pickle `supercrc.pk', that provides a `calculate_crc' function with the following spec: ,---- | fun calculate_crc = (byte[] data, byte[] control) int: `---- So let's use the function like this in our type, after loading the supercrc pickle: ,---- | load supercrc; | | type Packet = | struct | { | byte magic = 0xab; | byte size; | | var real_size = (size == 0xff ? 0 : size); | | byte[real_size] payload; | byte[real_size] control; | | int crc = calculate_crc (payload, control); | }; `---- However, there is a caveat: it happens that the calculation of the CRC may involve arithmetic and division, so the CRC Foundation warns us that the `calculate_crc' function may raise E_div_by_zero. However, the Packet 1.2 Specification tells us that in these situations, the `crc' field of the packet should contain zero. If we used the type above, any exception raised by `calculate_crc' would be propagated by the mapper/constructor: ,---- | (poke) Packet @ 12#B | unhandled division by zero exception `---- A solution is to use a function that takes care of the extra needed logic, wrapping calculate_crc: ,---- | load supercrc; | | type Packet = | struct | { | byte magic = 0xab; | byte size; | | var real_size = (size == 0xff ? 0 : size); | | byte[real_size] payload; | byte[real_size] control; | | fun corrected_crc = int: | { | try return calculate_crc (payload, control); | catch if E_div_by_zero { return 0; } | } | | int crc = corrected_crc; | }; `---- Again, note how the function is accessible after its definition. Note as well how both fields and variables and other functions can be used in the function body. There is no difference to define variables and functions in struct types than to define them inside other functions or on the top-level environment: the same lexical rules apply. Methods ======= At this point you may be thinking something on the line of "hey, since variables and functions are also members of the struct, I should be able to access them the same way than fields, right?". So you will want to do: ,---- | (poke) var p = Packet @ 12#B | (poke) p.real_size | (poke) p.corrected_crc `---- But sorry, this won't work. To understand why, think about the struct building process we sketched above. The mapper and constructor functions are derived/compiled from the struct type. You can imagine them to have prototypes like: ,---- | Packet_mapper (IOspace, offset) -> Packet value | Packet_constructor (template) -> Packet value `---- You can also picture the fields, variables and functions in the struct type specification as being defined inside the bodies of Packet_mapper and Packet_constructor, as their contents get mapped/constructed. For example, let's see what the mapper does: ,---- | Packet_mapper: | | . Map a byte, put it in a local `magic'. | . Map a byte, put it in a local `size'. | . Calculate the real size, put it in a local `real_size'. | . Map an array of real_size bytes, put it in a local `payload'. | . Map an array of real_size bytes, put it in a local `control'. | . Compile a function, put it in a local `corrected_crc'. | . map a byte, call the function in the local `corrected_crc', | complain if the values are not the same, otherwise put the | mapped byte in a local `crc'. | . Build a struct value with the values from the locals `magic', | `size', `payload', `control' and `crc', and return it. `---- The pseudo-code for the constructor would be almost identical. Just replace "map a byte" with "construct a byte". So you see, both the values for the mapped fields and the values for the variables and functions defined inside the struct type end as locals of the mapping process, but only the values of the fields are actually put in the struct value that is returned in the last step. This is where methods come in the picture. A method looks very similar to a function, but it is not quite the same thing. Let me show you an example: ,---- | load supercrc; | | type Packet = | struct | { | byte magic = 0xab; | byte size; | | var real_size = (size == 0xff ? 0 : size); | | byte[real_size] payload; | byte[real_size] control; | | fun corrected_crc = int: | { | try return calculate_crc (payload, control); | catch if E_div_by_zero { return 0; } | } | | int crc = corrected_crc; | | method c_crc = int: | { | return corrected_crc; | } | }; `---- We have added a method `c_crc' to our Packet struct type, that just returns the corrected superCRC (patented, TM) of a packet. This can be invoked using dot-notation, once a Packet value is mapped/constructed: ,---- | (poke) var p = Packet @ 12#B | (poke) p.c_crc | 0xdeadbeef `---- Now, the important bit here is that the method returns the corrected crc _of a Packet_. That's it, it actually operates on a Packet value. This Packet value gets implicitly passed as an argument whenever a method is invoked. We can visualize this with the following "pseudo Poke": ,---- | method c_crc = (Packet SELF) int: | { | return SELF.corrected_crc; | } `---- Fortunately, poke takes care to recognize when you are referring to fields of this implicit struct value, and does The Right Thing(TM) for you. This includes calling other methods: ,---- | method foo = void: { ... } | method bar = void: | { | [...] | foo; | } `---- The corresponding "pseudo-poke" being: ,---- | method bar = (Packet SELF) void: | { | [...] | SELF.foo (); | } `---- It is also possible to define methods that modify the contents of struct fields, no problem: ,---- | var packet_special = 0xff; | | type Packet = | struct | { | byte magic = 0xab; | byte size; | [...] | | method set_size = (byte s) void: | { | if (s == 0) | size = packet_special; | else | size = s; | } | }; `---- This is what is commonly known as a "setter". Note, incidentally, how a method can also use regular variables. The Poke compiler knows when to generate a store in a normal variable such as `packet_special', and when to generate a set to a SELF field. A few restrictions ================== Given the different nature of the variables, functions and methods, there are a couple of restrictions: - Functions can't set fields defined in the struct type. This will be rejected by the compiler: ,---- | type Foo = | struct | { | int field; | fun wrong = void: { field = 10; } | }; `---- Remember the construction/mapping process. When a function accesses a field of the struct type like in the example above, it is not doing one of these pseudo `SELF.field = 10'. Instead, it is simply updating the value of the local created in this step in Foo_mapper: ,---- | Foo_mapper: | | . Map an int, put it in a local `field'. | . [...] `---- Setting that local would impact the mapping of the subsequent fields if they refer to `field' (for example, in their constraint expression) but it wouldn't actually alter the value of the field `field' in the struct value that is created and returned from the mapper! This is very confusing, so we just disallow this with a compiler error "invalid assignment to struct field", for your own sanity 8-) - Methods can't be used in field constraint expressions, nor in variables or functions defined in a struct type. How could they be? The field constraint expressions, the initialization expressions of variables, and the functions defined in struct types are all executed as part of the mapper/constructor and, at that time, there is no struct value yet to pass to the method. If you try to do this, the compiler will greet you with an "invalid reference to struct method" message. Happy poking! :)