Applied Pokology
Back to blog...
_____
---' __\_______
______) Writing binary utilities with GNU poke
__)
__)
---._______)
Jose E. Marchesi
July 16, 2020
GNU poke is, first and foremost, intended to be used as an interactive
editor, either directly on the command line or using a graphical user
interface built on it. However, since its conception poke was intended
to also provide a suitable and useful foundation on which other
programs, the so-called binary utilities, could be written. At last,
the development of poke has progressed to a point where we can start
writing such utilities, and the purpose of this article is to show a
small, albeit working and useful example of what can be achieved by
writing a few lines of Poke: an extractor for ELF sections.
elfextractor
============
We will be hacking a very simple utility called elfextractor, that
extracts the contents of the sections of an ELF file, whose name is
provided as an argument in the command line, into several output
files. This is the synopsis of the program:
,----
| elfextractor FILE [SECTION_NAME]
`----
Where `FILE' is the name of the ELF file from which to extract
sections, and an optional `SECTION_NAME' specifies the name of the
section to extract.
Say we have a file `foo.o' and we would like to extract its text
section. We would use elfextractor like:
,----
| $ elfextractor foo.o .text
`----
Provided `foo.o' indeed has a section named `.text', the utility will
create a file `foo.o.text' with the section's contents. Note how the
names of the output files are derived concatenating the name of the
input ELF file and the name of the extracted section.
If no section name is specified, then all sections are extracted. For
example:
,----
| $ elfextractor foo.o
| $ ls foo.o*
| foo.o foo.o.eh_frame foo.o.shstrtab foo.o.symtab
| foo.o.comment foo.o.rela.eh_frame foo.o.strtab foo.o.text
`----
Before writing elfextractor, however, we must first learn a few things
about writing Poke scripts...
Poke scripts
============
In interactive usage, there are two main ways to execute Poke code: at
the interactive prompt (or REPL), and loading "pickles".
Executing Poke code at the REPL is as easy as introducing a statement
or expression:
,----
| (poke) print "Hello\n"
| Hello
`----
Executing Poke code in a pickle is performed by loading the file
containing the code:
,----
| (poke) .load say-hello.pk
| Hello
`----
Where `say-hello.pk' contains simply:
,----
| print "Hello\n";
`----
However, we would like to have Poke scripts, i.e. to be able to
execute Poke programs as standalone programs, from the shell. In
other words, we want to use GNU poke as an interpreter. This is
achieved by using a shebang, which should appear at the top of the
script file. The poke shebang looks like this:
,----
| #!/usr/bin/poke -L
| !#
`----
The `-L' command line option tells poke that it is being used as an
interpreter. Additional arguments for poke can be specified before
`-L' (but not after). The `#! ... !#' is an alternative syntax for
multi-line comments, which allows to have the shebang at the top of a
Poke program without causing a syntax error. This nice trick has been
borrowed from guile.
Therefore, we could write say-hello as a Poke script like this:
,----
| #!/usr/bin/poke -L
| !#
|
| print "Hello\n";
`----
And then execute it like any other program or script:
,----
| $ ./say-hello
`----
Handling command-line arguments
===============================
When a Poke script is executed, the command line arguments passed to
the script become available in the array argv. Example:
,----
| #!/usr/bin/poke -L
| !#
|
| for (arg in argv)
| print "Argument: " + arg + "\n";
`----
Executing this script results in:
,----
| $ ./printargs foo bar 'baz quux'
| Argument: foo
| Argument: bar
| Argument: baz quux
`----
Note how it is not needed to have an argc variable, since the number
of elements stored in a Poke array can be queried using an attribute:
`argv'length'.
Note also that argv is only defined when poke runs as an interpreter:
,----
| $ poke
| [...]
| (poke) argv
| <stdin>:1:1: error: undefined variable 'argv'
| argv;
| ^~~~
`----
Exiting from scripts
====================
By default a Poke script will communicate a successful status to the
environment, upon exiting:
,----
| $ cat hello
| #!/usr/bin/poke -L
| !#
|
| print "hello\n";
| $ ./hello && echo $?
| 0
`----
In order to exit with some other status code, most typically to signal
an erroneous situation, the Pokeish way is to raise an `E_exit'
exception with the desired exit status code:
,----
| raise Exception { code = EC_exit, exit_status = 1 };
`----
This can be a bit cumbersome to write, so poke provides a more
conventional syntax in the form of an `exit' function:
,----
| fun exit = (int<32> exit_code = 0) void:
| {
| raise Exception { code = EC_exit, exit_status = exit_code };
| }
`----
Using `exit', the above raise statement becomes the much simpler:
,----
| exit (1);
`----
Loading pickles as modules
==========================
elfextractor deals with ELF object files. Extracting sections
requires dealing with several data structures encoded in the ELF file,
such as the header, the section header table, the string table (that
contains the names of the sections) and so on. It would be of course
possible to define Poke types for these structures in the script
itself but, as it happens, GNU poke ships with an already written
pickle that describes the ELF structures. It is called `elf.pk'.
So a script needing to mess with ELF data structures can just make use
of `elf.pk' using the load construction:
,----
| load elf;
`----
This looks for a file called `elf.pk' in a set of directories, which
are predefined by poke, and loads it. The list of directories where
poke looks for pickles is stored in the load_path variable as a colon
separated list of directory names, and can be customized:
,----
| $ poke
| [...]
| (poke) load_path
| "/home/jemarch/.poke.d:.:/home/jemarch/.local/share/poke:..."
`----
The default value of `load_path' contains both user-specific
directories and system-wide directories. This assures that all the
pickles installed by poke are available, and that the user can load
her own pickles in her scripts.
Once a pickle is loaded in a script the types, functions and variables
defined in it (either directly or indirectly by loading its own
pickles) become available.
Back to elfextractor
====================
All right, now that we know more about writing Poke scripts, let's go
back to our original task: to write elfextractor. This is an
implementation:
,----
| #!/usr/bin/poke -L
| !#
|
| /* elfextractor - Extract sections from ELF64 files. */
|
| load elf;
|
| if (!(argv'length in [1,2]))
| {
| print "Usage: elfextractor FILE [SECTION_NAME]\n";
| exit (1);
| }
|
| var file_name = argv[0];
| var section_name = (argv'length > 1) ? argv[1] : "";
|
| try
| {
| var fd = open (file_name, IOS_M_RDONLY);
| var elf = Elf64_File @ fd : 0#B;
|
| for (shdr in elf.shdr where shdr.sh_type != 0x0)
| {
| var sname = elf.get_string (shdr.sh_name);
|
| if (section_name == "" || sname == section_name)
| save :ios elf'ios :file file_name + sname
| :from shdr.sh_offset :size shdr.sh_size;
| }
|
| close (fd);
| }
| catch (Exception e)
| {
| if (e == E_constraint)
| printf ("error: `%s' is not a valid ELF64 file\n", file_name);
| else if (e == E_io)
| printf ("error: couldn't open file `%s'\n", file_name);
| else
| raise e;
|
| exit (1);
| }
`----
First the command line arguments are handled. The script checks
whether the right number of arguments have been passed (either 1 or 2)
exiting with an error code otherwise. The file name and the section
name are then extracted from the `argv' array.
Once we have the file name and the optional desired section name, it
is time to do the real work. The code is enclosed in a try-catch
block statement, because some of the operations may result on
exceptions being raised.
First, the ELF file whose name is specified in the command line is
opened for reading:
,----
| var fd = open (file_name, IOS_M_RDONLY);
`----
The built-in function `open' returns a file descriptor that can be
subsequently used in mapping operations. If the provided file name
doesn't identify a file, or if the file can't be read for whatever
reason, an `E_io' exception is raised. Note how the exception is
handled in the `catch' block, emitting an appropriate diagnostic
message and exiting with an error status.
Once the ELF file is open for reading, we map an `Elf64_File' on it,
at the expected offset (zero bytes from the beginning of the file):
,----
| var elf = Elf64_File @ fd : 0#B;
`----
If the file doesn't contain valid ELF data, this map will fail and
raise an `E_constraint' exception. Again, the `catch' block handles
this situation.
At this point the variable `elf' contains an `Elf64_File'. Since we
want to extract the sections contained in the file, we need to somehow
iterate on them. The section header table is available in `elf.shdr'.
A for-in-where loop is used to iterate on all the headers, skipping
the "null" ELF sections which are always empty, and are characterized
by a `shdr.sh_type' of 0. An inner conditional filters out sections
whose name do not match the provided name in the command line, if it
was specified at all.
For each matching section we then save its contents in a file named
after the input ELF file, by calling a function `save', which is
provided by poke:
,----
| save :ios elf'ios :file file_name + sname
| :from shdr.sh_offset :size shdr.sh_size;
`----
The above is exactly what we would have written at the poke REPL!
(modulus trailing semicolon). How is this supposed to work? Thing
is, GNU poke commands are implemented as Poke functions. Let's
consider `save', for example. It is defined as a function having the
following prototype:
,----
| fun save = (int ios = get_ios,
| string file = "",
| off64 from = 0#B,
| off64 size = 0#B,
| int append = 0,
| int verbose = 0) void:
| { ... }
`----
Once a Poke function is defined in the environment, it becomes
available as such. Therefore, in a poke session we could call it
like:
,----
| (poke) save (get_ios, "filename", 0#B, 12#B, 0, 1)
`----
However, this is cumbersome and error prone. To begin with, we should
remember the name, position and nature of each argument accepted by
the command. What is even more annoying, we are forced to provide
explicit values for them, like in the example above we have to pass
the current IOS (the default), and 0 for `append' (the default) just
to being able to set `verbose'. Too bad.
To ease commanding poke, the Poke language supports an alternative
syntax to call functions, in which the function arguments are referred
by name, can be given in any order, and can be omitted. The command
above can be thus written like:
,----
| (poke) save :from 0#B :size 12#B :verbose 1
`----
This syntax is mostly intended to be used interactively, but nothing
prevents to use it in Poke programs and scripts whenever it is deemed
appropriate, like we did in elfextractor. We could of course have
used the more conventional syntax:
,----
| if (section_name == "" || sname == section_name)
| save (elf'ios, file_name + sname,
| shdr.sh_offset, shdr.sh_size, 0, 0);
`----
What style to use is certainly a matter of taste.
Anyhow, once the sections have been written out, the file descriptor
is closed and the program exits with the default status, which is
success. Should the `save' function find any problem saving the data,
such as a full disk, not enough permissions or the like, exceptions
will be raised, caught and maybe handled by our `catch' block.
And this is it! The complete program is 44 lines long. This is a
good example that shows how, given a pickle providing a reasonable
description of some binary-oriented format (ELF in this case) poke can
be leveraged to achieve a lot in a very concise way, free from the
many details involved in the encoding, reading and writing of binary
data.
Happy poking! :)