the Author Nalen98

Good day!

The theme of my research within the framework of a summer internship “Summer of Hack 2019” in the company Digital Security was “Decompilation in eBPF Ghidra“. It was necessary to develop the language of the Sleigh system broadcast eBPF bytecode in PCode Ghidra for the opportunity to carry out the disassembly and decompilation eBPF programs. The study is designed to Ghidra extension that adds support eBPF processor. The study, like other interns, can rightly be considered “pervoprokhodtsy”, as previously in other tools reverse engineering it was not possible to decompile the eBPF.

The background

This topic got me in a great irony, because eBPF I was not previously familiar, and Ghidr-Oh had not previously enjoyed, as was a kind of dogma that “IDA Pro better.” As it turned out, it is not so.

Familiarity with Ghidra was very rapid because its developers have made a very competent and available documentation. Also, I had to learn the language processor specifications Sleigh, which carried out the development. The developers tried to fame and created very detailed documentation for the tooland Sleigh, for which he thanks.

On the other side of the fence was extended Berkeley Packet Filter. eBPF is a virtual machine in the Linux kernel that allow to load arbitrary user code that can be used to trace processes and packet filtering in the kernel space. Architecture is a register-based machine with RISC 11 64-bit registers, program counter, and a 512-byte stack. For eBPF there are a number of limitations:

  • cycles are prohibited;
  • memory access is possible only through the stack (about it will be a different story);
  • kernel functions are available only through a special wrapper function (eBPF-helpers).


Structure eBPF technology. Image source: www.brendangregg.com/ebpf.html

Basically, this technology is used for network task — debug, packet filtering and so on at the kernel level. Support eBPF added with 3.15 kernel, with “Linux plumbers conference 2019” this technology has been devoted quite a lot of reports. But eBPF, unlike Ghidra, the documentation is incomplete and many do not contain. Therefore, clarification and missing information had to find on the Internet. Looking for answers took a lot of time, and one can only hope that technology will modify and create normal documentation.

Bad documentation

In order to develop the specification on the Sleigh, one must first understand how the architecture of the target processor. And then we turn to the official documentation.

She has a number of shortcomings:

  • Incompletely described the structure of eBPF instructions.

In most of the specifications, such as Intel x86, usually specified, which takes each bit of the user to which the block belongs. Unfortunately, the specifications eBPF these details either scattered throughout the document or absent, resulting in the need to get the missing nuggets from the details of implementation in the Linux kernel.

For example, in the structure of the user op:8, dst_reg:4, src_reg:4, off:16 imm:32 is not a word, what offset (off) and immediate (imm) are signed, and this is extremely important because it affects the arithmetic instructions to jumps. Helped the source code of the Linux kernel.

  • No complete picture of all possible mnemonica architecture.

Some records include not only all the instructions, their operands, but also their semantics in C, applications, characteristics of the operands, and so on. The documentation contains the classes of eBPF instructions, but this is not enough for a developer. Let’s consider them in more detail.

All instructions from eBPF 64-bit, except for the LDDW (Load double word), it has a size of 128 bits, it produces the concatenation of two imm to 32 bits. eBPF instructions have the following structure.


eBPF instruction encoding

The structure of the field OPAQUE depends on the class of instructions (ALU/JMP Load/Store).

For example, class instructions ALU:

ALU instructions encoding

and class JMP have their own structure fields:

Branch instructions encoding

For Load/Store instructions, the structure is different:

Load/Store instructions encoding

To understand this helped informal documentation eBPF.

  • There is no information about call-helpers, which built most of the logic eBPF programs to the Linux kernel.

And this is very strange, because helpers — the most important thing in eBPF programs, they fulfill the tasks, which sharpened technology.


EBPF interaction with nuclear functions

The program pulls the function away from the nucleus, and they carry out work processes, manipulate network packets, work with eBPF maps, access sockets, interact with userspace-Ohm. Such helpers about 110, and they are all decorated in a blackbox-s (i.e., we don’t care about the working algorithm of these functions, only the input arguments and whether the return value) to developed specifications (the expansion). Despite the fact that the same nuclear functions in the official documentation could be written about them in more detail. Full information found in the source code linux.

  • No word on the “tail calls”.


Tail calls eBPF. Image source: cilium.readthedocs.io/en/latest/bpf/#tail-calls.

Tail calls is a mechanism that allows one eBPF programs to call another one, not returning to the previous, i.e. jumping between different eBPF programs. In the developed extension they are not implemented, more information can be found in the documentation Cilium.

Bad documentation and a number of architectural features of the eBPF was the main “splinters” in development, because they engender other problems. Fortunately, most of them have been successfully resolved.

About the development environment

Not all developers know to create and edit Sleigh code and all files of extensions/plugins for Ghidra is quite a handy tool — Eclipse IDE plugin support GhidraDev and GhidraSleighEditor. When you create an extension it will be immediately executed in the form of a working draft, there is quite a handy backlight for Sleigh-code and checker major errors in the syntax of the language.

In Eclipse you can run Ghidra (with included extension) to debug, which is very convenient. But perhaps the coolest feature is the support for the regime “Ghidra Headless”, do not need to restart Ghidr-in GUI for 100500 times to find the error in the code, all processes are carried out in the background.

Notepad can be closed! And download Eclipse from the official website. To install the plugin Ecplise, select Help → Install New Software…, press Add and select the zip archive of the plugin.

Extension development

The extension was developed by the file processor specifications, the loader, which inherits from the basic ELF loader and extends its capabilities in terms of recognition eBPF programs handler relocate for the implementation of eBPF Maps in disassembler and Decompiler Ghidra, and analyzer for determining the signature eBPF-helper cells.


The extension files in the project view in Eclipse IDE

Now about the main file:

.spec — specifies what types of data are used, how much memory they allocated in eBPF, set the stack size, set the label “stackpointer” on the register R10, signs calling Convention. Agreement (as the rest) has been implemented according to the documentation:

> Therefore, eBPF calling convention is defined as:
> * R0 — return value from in-kernel function, and exit value for eBPF program
> * R1 — R5 — arguments from eBPF program to in-kernel function
> * R6 — R9 — callee saved registers that in-kernel function will preserve
> * R10 — read-only frame pointer to access stack

eBPF.spec

<?xml version="1.0" encoding="UTF-8"?>
<compiler_spec>
<data_organization>
<absolute_max_alignment value="0" />
<machine_alignment value="2" />
<default_alignment value="1" />
<default_pointer_alignment value="4" />
<pointer_size value="4" />
<wchar_size value="4" />
<short_size value="2" />
<integer_size value="4" />
<long_size value="4" />
<long_long_size value="8" />
<float_size value="4" />
<double_size value="8" />
<long_double_size value="8" />
<size_alignment_map>
/>
/>
/>
/>
</size_alignment_map>
</data_organization>
/>
/>

/>
<default_proto>
/>

/>

/>

/>

/>

/>

/>
/>
/>
/>
/>
/>

</default_proto>
</compiler_spec>

Before continuing to describe the design files, will stop at a line .spec file.

<stackpointer register="R10" space="ram"/>

It is the main source of evil in decompilation in eBPF Ghidra, and with it began a fascinating journey into the stack of eBPF, which has a number of bad times, and who brought the most pain to the development.

All we need is… Stack

Please refer to the official documentation in the kernel:

> Q: Can BPF programs access instruction pointer or return address?
>
> A: NO.
>
> Q: Can BPF programs access the stack pointer?
>
> A: NO. Only the frame pointer (register R10) is accessible. From compiler point of view it’s necessary to have a stack pointer. For example, the LLVM register R11 defines as its stack pointer in BPF backend, but it makes sure that generated code never uses it.

The processor has no instruction pointer (IP) or stack pointer (SP), and the last for Ghidra is extremely important, and it depends on the quality of decompilation. In the specfile, you must specify which register is stackpointer-om (as demonstrated above). R10 is the only eBPF register, which allows you to access the program stack, it’s framepointer, is static, and is always zero. To hang the label “stackpointer” for R10 in the specfile wrong, but there are no other options, because then Ghidra will not work with the stack of the program. Accordingly, the original SP is missing, and nothing replaces the architecture eBPF.

This implies several problems:

1. Field “Stack Depth” in the Ghidra is guaranteed to be zero, since we have to assign R10 techpioneers in these terms of architecture, but at its core, it is always zero that was reasonably informed. “Stack Depth” will reflect the register labeled “stackpointer”. And this will have to accept, these are the features of the architecture.

2. Instructions that operate on the R10 (that is, those who are working with the stack) are often not decompilers. Ghidra not decompile usually what he thinks is a dead code (i.e. code snippets that will never be true). And since R10 is immutable, many store/load instructions are recognized Ghidr-Oh how deadcode and disappear from the Decompiler.

Fortunately, this problem was solved by writing a custom analyzer, as well as the announcement of an additional address space with eBPF-helpers in `pspec` file, what prompted one of the developers Ghidra in the Issue of the project.

Extension development (continued)

.ldefs describes the features of the processor, determines files specifications.

eBPF.ldefs

<?xml version="1.0" encoding="UTF-8"?>
<language_definitions>
<language processor="eBPF"
endian="little"
size="64"
variant="default"
version="1.0"
slafile="eBPF.sla"
processorspec="eBPF.pspec"
id="eBPF:LE:64:default">
eBPF processor 64-bit little-endian
/>
<external_name tool="DWARF.register.mapping.file" name="eBPF.dwarf"/>

</language_definitions>

File .opinion establishes a correspondence between the loader and processor.

eBPF.opinion

In .pspec announced the program counter, but he eBPF implicit in the specification is not used, therefore only for the sake of. By the way, PC from eBPF arithmetic, not the address (indicates an instruction, not a specific byte of the program), keep this in mind when Jarrah.
The file is also specified additional address space for eBPF helper, here they are declared as characters.

eBPF.pspec

<?xml version="1.0" encoding="UTF-8"?>
<processor_spec>
/>
<default_symbols>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
</default_symbols>
<default_memory_blocks>
<memory_block name="eBPFHelper_functions" start_address="syscall:0" length="0x200" initialized="true"/>
</default_memory_blocks>
</processor_spec>

.sinc file — the largest file extension, it identifies all registers, structure of eBPF instructions, tokens, blackbox-helper cells, mnemonics and semantics of statements in the language of the Sleigh.

A small snippet eBPF.sinc

define ram space type=ram_space size=8 default;
define space register type=register_space size=4;
define space syscall type=ram_space size=2;

define register offset=0 size=8 [ R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 PC ];

define token instr(64)
imm=(32, 63) signed
off=(16, 31) signed
src=(12, 15)
dst=(8, 11)
op_alu_jmp_opcode=(4, 7)
op_alu_jmp_source=(3, 3)
op_ld_st_mode=(5, 7)
op_ld_st_size=(3, 4)
op_insn_class=(0, 2)
;
#We’ll need this token to operate with LDDW instruction, which has 64-bit imm value
define token immtoken(64)
imm2=(32, 63)
;
#To operate with registers
attach variables [ src dst ] [ R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 _ _ _ _ _ ];

:ADD dst, src is the src & dst & op_alu_jmp_opcode=0x0 & op_alu_jmp_source=1 & op_insn_class=0x7 { dst=dst + src; }
:ADD dst, imm is imm & dst & op_alu_jmp_opcode=0x0 & op_alu_jmp_source=0 & op_insn_class=0x7 { dst=dst + imm; }

Loader eBPF extends the basic capabilities of the ELF loader so it can recognize that the program that you downloaded in the Ghidra, the processor — eBPF. For him ElfConstants Ghidra selected BPF is a constant, and the loader defines eBPF-processor.

eBPF_ElfExtension.java

package ghidra.app.util.bin.format.elf.extend;

import ghidra.app.util.bin.format.elf.*;
import ghidra.program.model.lang.*;
import ghidra.util.exception.*;
import ghidra.util.task.TaskMonitor;

public class eBPF_ElfExtension extends ElfExtension {

@Override
public boolean canHandle(ElfHeader elf) {
return elf.e_machine() == ElfConstants.EM_BPF && elf.is64Bit();
}
@Override

public boolean canHandle(ElfLoadHelper elfLoadHelper) {
Language language = elfLoadHelper.getProgram().getLanguage();
return canHandle(elfLoadHelper.getElfHeader()) &&
“eBPF”.equals(language.getProcessor().toString()) &&
language.getLanguageDescription().getSize() == 64;
}

@Override
public String getDataTypeSuffix() {
return “eBPF”;
}

@Override
public void processGotPlt(ElfLoadHelper elfLoadHelper, TaskMonitor monitor) throws CancelledException {
if (!canHandle(elfLoadHelper)) {
return;
}
super.processGotPlt(elfLoadHelper, monitor);
}

}

Handler relocate necessary for the implementation of eBPF maps in disassembler and Decompiler. The interaction with them is via a number of helper functions use the file descriptor to point to the map-s. Based on the table relocate, it is seen that the loader patch manual LDDW, which generates Rn for these helpers (e.g. `bpf_map_lookup_elem(…)`).

Therefore, the handler parses the table relocati program finds the address of relocate (instructions) and assemble a string information about the name of Mapa. Next, referring to the symbol table, computes the actual addresses of these maps and patch instructions.

eBPF_ElfRelocationHandler.java

public class eBPF_ElfRelocationHandler extends ElfRelocationHandler {

@Override
public boolean canRelocate(ElfHeader elf) {
return elf.e_machine() == ElfConstants.EM_BPF;
}

@Override
public void relocate(ElfRelocationContext elfRelocationContext, ElfRelocation relocation,
Address relocationAddress) throws MemoryAccessException, NotFoundException {

ElfHeader elf = elfRelocationContext.getElfHeader();
if (elf.e_machine() != ElfConstants.EM_BPF) {
return;
}

Program Program = elfRelocationContext.getProgram();
Memory Memory = program.getMemory();

int type = relocation.getType();
int symbolIndex = relocation.getSymbolIndex();
long value;
appliedSymbol boolean = true;

//Relocations with maps always have type set to 0x1. Since eBPF hasn’t names of constants (types) of relocations, it was decided to use the magic //number 1.
if (type == 1) {
try {
int SymbolIndex= relocation.getSymbolIndex();
ElfSymbol Symbol = elfRelocationContext.getSymbol(SymbolIndex);
String map = Symbol.getNameAsString();

SymbolTable table = program.getSymbolTable();
Address mapAddr = table.getSymbol(map).getAddress();
String sec_name = elfRelocationContext.relocationTable.getSectionToBeRelocated().getNameAsString();
if (sec_name.toString().contains(“debug”)) {
return;
}

value = mapAddr.getAddressableWordOffset();
Byte dst = memory.getByte(relocationAddress.add(0x1));
memory.setLong(relocationAddress.add(0x4), value);
memory.setByte(relocationAddress.add(0x1), (byte) (dst + 0x10));
}
catch(NullPointerException e) {}
}

if (appliedSymbol && symbolIndex == 0) {
markAsWarning(program, relocationAddress, Long.toString(type),
“applied relocation with the symbol index of 0”, elfRelocationContext.getLog());
}

}
}


The result of the disassembly and decompilation eBPF

And in the end, get a disassembler and a Decompiler eBPF! Use your health!

ToDo and la finale

The extension on eBPF for Ghidra.

The releases here.

PS

A huge thank you to Digital Security for the interesting training, especially mentors from research (Alexander and Nicholas). Low bow to you! Source