Analysis tour with Binary Ninja
Martial Arts
If you want to know exactly what operations a program on your computer performs, you have several ways to find out. One obviously simple method is to investigate the source code of the file, if available. You can do so easily with scripting languages like Python or open source software, but if programs are translated into bytecode before delivery, your job is a little more complicated.
Various tools are involved in converting code into an executable program: in most cases, at least a compiler and a linker. From the commands in a programming language, the compiler creates the machine language. In doing so, it optimizes the execution sequence or individual operations, depending on the configuration, to a greater or lesser extent, which can ultimately have a major effect on the resulting machine code. During linking, the libraries are statically or dynamically linked to the program. Static linking adds the code from the libraries to the resulting program. If libraries are linked dynamically, the code is located in external files and is only added to the process's working memory when the program is started.
Binary Ninja [1] is a tool for static program analysis. Originally designed for use in capture-the-flag competitions, Binary Ninja is now being developed commercially. For initial insights into the functionality, as covered by this article, it's fine to use the free trial version; download and install the version for your operating system to try it out.
Create a Test Program
To get started with a program that is as simple as possible for analysis, it's a good idea to write your own. Name the following code admin.c
:
#include <stdio.h> int main(int argc, char **argv) { printf("Hello World!"); return 0; }
You can compile the source code with:
gcc -O0 admin.c -o admin
The -O0
entry switches off the compiler optimizations to keep the machine code closer to the source code, which is especially useful for debugging during development.
Analyzing Applications
To analyze the program you created, launch Binary Ninja and open the admin
file. The analysis interface shows an overview of the symbols included in the binary, a code view, and the structure of the program in a feature map, among other options. Binary Ninja offers different levels of the generated program code, which can significantly facilitate the analysis. The default High Level IL
output shows generated code that already looks very similar to the original C code. IL stands for "intermediate language"; Binary Ninja offers different language levels up to C code. In this example, the main
function will look very similar to the C code.
The call to printf
is replaced by the underlying function __printf_chk
(Figure 1), which is called at the code level and checks the format string for possible stack overflows before outputting. If you want to get closer to the machine code, click through the different language layers and look at the different derivations.
Note that the displayed code is generated from the machine code. It's more of an approximation of programming in a high-level language like C or C++ than an actual representation of the underlying code. This is also evident in the generic, and mostly not very meaningful, variable names, which are oriented on the CPU registers or memory locations. You cannot compile this code again without further editing.
Now select Disassembly
in the top bar, and you will see the program in assembler code. The main
function then contains the commands. In addition to operations on the stack pointer rsp
, you can see how lea
is used to load the address of the Hello World!
string into the rsi
register (Figure 2) – after all, the format string resides in the Data area of the binary. Next is initializing the registers edi
with 1
and eax
with
. Instead of xor eax, eax
, you could also use mov eax, 0x0
. In fact, this is already a compiler optimization, because the opcode is two bytes shorter, which prepares the arguments for __printf_chk
, whereas call
calls the function. Afterward, a
return value is stored in eax
, the stack pointer is reset, and the function is exited.
You are currently in executable and linkable format (ELF) mode in the code display. ELF is the Linux binary format that is evaluated directly by Binary Ninja and displayed accordingly. If you open the menu where ELF is currently selected and switch to RAW mode, you can view the ELF headers themselves, which is where you will find, for example, the pointers to dynamic libraries or the string and symbol tables.
For example, to display the sequence graph instead of the linear representation of the code, select the Graph
representation in the menu to the right – currently still with the Linear
display. In this small example, only the main
function is initially displayed. For example, if you select deregister_ tm_clones
from the icons on the left to manage memory transactions, the graph becomes a little larger. If you open a real program with Binary Ninja, you can better understand the structure and relationships of the processes with the flow graph.
If you select Hex as the display format instead, the program file will be displayed in a hex editor. Besides the address on the left and the hexadecimal representation in the middle, you can see the printable characters on the right. For example, here you can find the hex values of the string Hello World! . However, changes to the program are not possible in this mode.
Besides ELF programs, Binary Ninja supports many other executable file formats. You can analyze the portable executable (PE) binaries common on Windows systems, as well as the Mach-O binaries used by Apple's operating system. You are not limited to x86 or x64 platforms and can disassemble programs compiled for architectures such as ARM, MIPS, or PowerPC. Because Binary Ninja is not a debugger or decompiler, but disassembles binary data in line with the assembler for the respective platform, interpreted languages or those with an intermediate representation of the code, such as with the Java Virtual Machine (VM) or .NET, cannot be meaningfully analyzed.
Extensions and Legal
If you have a valid license, you can use the Binary Ninja Python API. With its help, you can automate operations that you perform regularly or control the program (e.g., change settings or displays, move around the generated code, or launch plugins). If you are missing a function, Binary Ninja lets you add plugins. Of course, you can also develop these yourself. Interfaces to C/C++, Rust, and Python are available for this purpose. The Plugin Manager also lets you install extensions written by other users.
To analyze malware, experts regularly use tools such as Binary Ninja or Ghidra [2], developed by the US National Security Agency (NSA), to convert binary code into other representations. Copyright law sets narrow limits for this activity. For example, you are only allowed to convert the code if you hold the rights to it yourself or if it is necessary to ensure the interoperability of an existing program. Analyzing malware to protect your own infrastructure is presumably not critical here, but whether parts of the code may be used in the context of public reporting is something that needs to be examined on a case-by-case basis. Of course, it seems very unlikely that the malware developers, who are themselves criminals, will enforce their rights in a court of law in this case.
Buy this article as PDF
(incl. VAT)