_____
 ---'   __\_______
            ______)       Endianness in Poke - And a little nice hack
            __)
           __)
 ---._______)

                                                      Jose E. Marchesi
                                                      October 10, 2019


Byte endianness  is an important aspect  of encoding data.  As  a good
binary editor  poke provides support  for both little and  big endian,
and will soon acquire the ability to encode exotic endianness like PDP
endian.  Endianness control is integrated in the Poke language, and is
designed to be easily used in type descriptions.  Let's see how.

GNU  poke  maintains   a  global  variable  that   holds  the  current
endianness.  This  is the  endianness that will  be used  when mapping
integers whose types do not specify an explicit endianness.

Like other  poke global  state, this global  variable can  be modified
using the '.set' dot-command:

.set endian little
.set endian big
.set endian host
We can easily see how changing The current endianness indeed impacts the way integers are mapped:
(poke) dump :from 0#B :size 4#B :ruler 0 :ascii 0
00000000: 8845 4c46
(poke) .set endian little
(poke) int @ 0#B
0x464c4588
(poke) .set endian big
(poke) int @ 0#B
0x88454c46
However, as handy as this dot-command may be, it is also important to be able to change the current endianness programmatically from a Poke program. For that purpose, the PKL compiler provides a couple of built-in functions: 'get_endian' and 'set_endian'. Their definitions, along with the specific supported values, look like:
var ENDIAN_LITTLE = 0;
var ENDIAN_BIG = 1;

fun get_endian = int: { ... }
fun set_endian = (int endian) int: { ... }
Accessing the current endianness programmatically is especially useful in situations where the data being poked features a different structure, depending on the endianness. A good (or bad) example of this is the way registers are encoded in eBPF instructions. eBPF is the in-kernel virtual machine of Linux, and features an ISA with ten general-purpose registers. eBPF instructions generally use two registers, namely the source register and the destination register. Each register is encoded using 4 bits, and the fields encoding registers are consecutive in the instructions. Typical. However, for reasons I won't be discussing right now (because I'm having a nice night and don't want to ruin it) the order of the source and destination register fields is switched depending on the endianness. In big-endian systems the order is:
dst:4 src:4
Whereas in little-endian systems the order is:
src:4 dst:4
In Poke, the obvious way of representing data whose structure depends on some condition is using an union. In this case, it could read like this:
type BPF_Insn_Regs =
  union
  {
    struct
    {
      BPF_Reg src;
      BPF_Reg dst;
    } le : get_endian == ENDIAN_LITTLE;

    struct
    {
      BPF_Reg dst;
      BPF_Reg src;
    } be;
  };
Note the call to the 'get_endian' function (which takes no arguments and thus can be called Algol68-style, without specifying an empty argument list) in the constraint of the union alternative. This way, the register fields will have the right order corresponding to the current endianness. Nifty. However, there is an ever better way to denote the structure of these fields. This is it:
type BPF_Insn_Regs =
  struct
  {
    var little_p = (get_endian == ENDIAN_LITTLE);

    BPF_Reg src @ !little_p * 4#b;
    BPF_Reg dst @ little_p * 4#b;
  };
This version, where the ordering of the fields is implemented using field labels, is not only more compact, but also has the virtue of not requiring additional "intermediate" fields like 'le' and 'be' above. It also shows how convenient can be to declare variables inside structs. Let's see it in action:
(poke) BPF_Insn_Regs @ 1#B
BPF_Insn_Regs {src=%r4,dst=%r5}
(poke) .set endian big
(poke) BPF_Insn_Regs @ 1#B
BPF_Insn_Regs {src=%r5,dst=%r4}
Note the pretty printing of registers. This is achieved by having a pretty-printer method in the definition of 'BPF_Reg':
type BPF_Reg =
  struct
  {
   uint<4> code;

   fun _print = void:
   {
    print "%";
    if (code < BPF_R9)
      printf "r%i32d", code;
    else
      print "fp";
   }
  };
Changing the current endianness in constraint expressions is useful when dealing with binary formats that specify the endianness of the data that follows using some sort of tag. This is the case of ELF, for example. The first few bytes in an ELF header conform what is known as the 'e_ident'. One of these bytes is called 'ei_data' and its value specifies the endianness of the data stored in the ELF file. This is how we handle this in Poke:
fun elf_endian = (int endian) byte:
 {
   if (endian == ENDIAN_LITTLE)
     return ELFDATA2LSB;
   else
     return ELFDAT2MSB;
 }

[...]

type Elf64_Ehdr =
  struct
  {
    struct
    {
      byte[4] ei_mag : ei_mag[0] == 0x7fUB
                       && ei_mag[1] == 'E'
                       && ei_mag[2] == 'L'
                       && ei_mag[3] == 'F';
      byte ei_class;
      byte ei_data : (ei_data != ELFDATANONE
                      && set_endian (elf_endian (ei_data)));
      byte ei_version;
      byte ei_osabi;
      byte ei_abiversion;
      byte[6] ei_pad;
      offset&lt;byte,B&gt; ei_nident;
    } e_ident;

    [...]
  };
Note how 'set_endian' returns an integer value... it is always 1. This is to facilitate its usage in fields constraint expressions. Happy poking! :)