From debugging to exploiting
Secure Code
A number of modern protections are used to make software a bit more secure. Some of these are kernel based, whereas others are compiler based. In this article, I present a proof of concept capable of bypassing protections and exploiting code.
Many published papers have focused on the exploitation of ELF (executable and linkable format) binaries – a Linux standard file format – which bypasses modern protection techniques. (Table 1 lists a few techniques discussed in this article.) However, in some scenarios in which security has not historically been in the forefront, these protections are never applied, or, if so, the software holds many flaws that can still lead to a successful exploitation.
Table 1
Security Techniques
Acronym | Method |
---|---|
ASLR | Address space layout randomization |
NX/DEP | No-execute bit/data execution prevention |
RELRO | Relocation read-only |
SSP | Stack smashing protector |
PIE | Position-independent executable |
Modern Protections
A lot of effort has been put into mitigating most of the known exploitation techniques that take advantage of memory corruption issues. More often than not, techniques employed to protect modern systems against memory corruption are not even enabled by default, or – worse still – some code cannot run with all or some of these protections enabled. Current Debian GNU/Linux distros come with all of the following:
DEP/NX. The idea is trivial: No segment within an ELF binary without the execution bit should be used to hold code that could eventually be called and therefore executed thanks to storing opcodes within a buffer. This protection is kernel based and comes enabled out of the box.
ASLR. Whenever running a program, all needed dynamic libraries will be loaded at pseudo-random memory addresses to prevent an attacker from jumping to a certain function's address. This protection works along with PIE, which ensures that this randomness applies to the main executable as well.
Before PIE, the ASLR pseudo-random algorithm only considered the dynamic libraries, but not the main program itself. Whereas ASLR is kernel based and enabled by default, PIE is compiler based and must be specified when compiling the program with the -pie
flag. ASLR can be disabled system-wide at any time by running the following command:
echo 0 > /proc/sys/kernel/randomize_va_space
Stack Canaries. This compiler-based protection is in charge of detecting when a buffer overflow has taken place [1]. It stores a cookie (canary) with a particular pseudo-random value inside the stack between the saved stack frame and the local variables for the new function. This way, if a buffer overflows in an attempt to overwrite the return address (remember that it is placed right after the saved frame pointer), the canary will be overwritten as well.
Whenever returning from that function, the canary's value will be tested, and the program is terminated in case of no mismatch. It is necessary to compile the code with the -fstack-protector
or -fstack-protector-all
flags to enable it.
RELRO. This protection is also compiler based and must be enabled by compiling the code with the -Wl,-z,relro,-z,now
flags; otherwise, it will not be enabled. RELRO prevents the .got.plt
table from being overwritten and is discussed further in the next sections. A partial RELRO
protection exists that can still be exploited, as detailed in the next sections. This partial protection can be enabled during compilation using the -Wl,-z,relro
flags.
Ancient Code and Arrays
Scientists tend to code in the Fortran 77, Fortran 90, or C languages and compile and run on grid clusters. More often than not, this code is old and complex, written more than 10 years ago, and still evolving today.This old, inherited code, which was written when security was far from the norm, continues to be developed through the years. The original sources are seldom modified, although they are expanded to do new calculations or to work with new input models.
This add-and-patch method of extending code can result in segmentation faults and other memory violation errors, leading to code susceptible to exploitation and allowing illegal access to a system. This scenario clearly represents a hostile environment in which exploits can appear. The nature of the numerical code that scientists normally write and execute can lead to memory corruption issues specific to arrays that, under some obscure circumstances, can be exploited successfully. More often than not, these bugs are undetected because the program ends execution without crashing or showing odd behavior. The end results or calculations are normal, so the code appears to be free of bugs.
A perfect example of this scenario is seen in a code snippet in which a vacf[]
array of 144 items of type double
is located in the BSS (uninitialized data) segment:
int i,j,k; for(i=0;i<tau_max;i++) vacf[j-i]=0;
What follows has been executed on different Linux boxes, all of them running an out-of-the-box Debian GNU/Linux Wheezy 7.0.8 with GNU gcc 4.7.2 on 64-bit platforms. In the for
loop, the variable j
is not initialized. If you compile this code using the -m32
compiler flag to generate an ELF-32 binary, the program could end execution with no apparent errors.
However, if you generate an ELF-64 binary (i.e., compile the code without the -m32
flag), you get a segmentation fault. Why? First, you should consider what value the variable j
holds when running the ELF-64 binary right before the program crashes, at the very first iteration:
Breakpoint 1, Calc_vacf () at ./MD.o.c:671 671 vacf[j-i]=0; (gdb) p j $1 = 1671723848
The value of j
translates into a huge address that it is undoubtedly out of the process address space. The following line computes this address using the GNU Project debugger (gdb):
(gdb) printf "0x%x\n",(j-i)+&vacf 0xd49039e0
The vacf[]
array holds, according to the sources, 144 items. The last item is located at &vacf[143]
, which is 0x83b460. Its first item happens to be the same as &vacf
, which is 0x83afe0. However, a j
value of 1,671,723,848 is already far beyond the last item in the vacf
array, so the vacf[j-i] = 0
statement is trying to write at address 0xd49039e0. Because this address is far beyond the process address space, a segfault arises, and the program is terminated by the kernel.
When running the ELF-32 binary, however, the value that j
holds is much smaller:
Breakpoint 1, Calc_vacf () at ./MD.o.c:671 671 vacf[j-i]=0; (gdb) p j $1 = 144
Clearly, the program can end its execution without crashing. Why, though, does local variable j
hold the value 144
? Well, the address for variable j
on the new stack frame for Calc_vacf
happens to be the same as the previous address for local variable j
in the Calc_gr
stack frame (where Calc_gr
is the function called before Calc_vacf
), and the value of j
right before returning from Calc_gr
is 144! Because j
is not initialized within Calc_vacf
, it still holds the previous value of 144 (Listing 1).
Listing 1
Sharing the Same Address for a Variable
(gdb) b MD.o.c:649 <-- Calc_gr (gdb) b MD.o.c:671 <-- Calc_vacf (gdb) r Breakpoint 1, Calc_gr (particle=0x827dea0) at MD.o.c:649 (gdb) c 649 fclose(file_gr); (gdb) p &j $1 = (int *) 0xffffc168 (gdb) p j $3 = 144 (gdb) c Breakpoint 2, Calc_vacf () at MD.o.c:671 671 vacf[j-i]=0; (gdb) p &j $6 = (int *) 0xffffc168 (gdb) p j $7 = 144
On the other hand, when dealing with the ELF-64 binary, the new address for local variable j
does not happen to be the same, and whatever happens to be stored in that address will be held by j
. You have already seen that this value is garbage and translates to a very large integer, but surely this behavior can depend on a lot of factors external to the program, such as the compiler version, processor, and so on [2].
In this particular case, if you are testing code compiled as an ELF-32 binary, orderly execution could lead you to misjudge the soundness of the code. Yet, the same code compiled as an ELF-64 binary triggers a segmentation fault. Even worse, sometimes a program finishing in an orderly manner and the correctness of its calculations do not correlate – bad logic design and implementation apart.
Molecular Dynamics Code Analysis
So far, I have introduced some basic concepts concerning built-in protections and presented a common scenario that suits exploitation fairly well. This proof of concept (PoC) will be capable of bypassing the DEP/NX and SSP protections and will still work when partial RELRO is active.
To begin, consider molecular dynamics code (i.e., a computer program that simulates the movement of atoms and molecules) executed via the qsub
command to a certain grid cluster. As soon as this program starts running, data is read from disk and written to disk.
This program is written purely in C and has some important flaws. Whenever the program works as expected, its calculations are saved in an output file called Final.dat
, and a string message is printed to stdout that contains Bye
. The binary is named MD.e32
, and it is an ELF-32 binary.
To see the protections enabled on this particular binary, I use the readelf
command or checksec.sh
script [3] (Figure 1).
Apparently, only NX bit protection is enabled. Remember that this is a kernel, not a compiler, protection mechanism; therefore, other exploitation techniques could still be used against this program. Although you don't need to know about the math or physics involved in this simulation, you do need to know the major problems concerning memory corruption issues that can arise in a C-like language. Bearing this in mind, after analyzing the source code for the program, a couple of interesting things stand out:
1. The program works with particles, implemented in the code as a singly linked list [4] of items of type PARTICLE
(Listing 2).
Listing 2
Particle Singly Linked List
01 typedef struct particle PARTICLE; 02 struct particle 03 { 04 unsigned int idparticle; 05 double pos[D]; 06 double vel[D]; 07 double force0[D]; 08 double force[D]; 09 double mass; 10 char bf[BFSIZE]; 11 void *nxtParticle; 12 }; 13 PARTICLE particle[(int) N]; 14 PARTICLE *particles; 15 ... 16 for(i=0;i<N;i++){ 17 particle[i].nxtParticle=(i==N-1)?NULL:&particle[i+1]; 18 particle[i].idparticle=i; 19 }
2. This program either loads some particles from or stores some particles to disk.
3. The function propagateParticle
copies a particle into the next particle in the singly linked list (Listing 3).
Listing 3
propagateParticle
01 void propagateParticle(PARTICLE src){ 02 printf("Propagating: %s to %p\n",src.bf,src.nxtParticle); 03 if(src.nxtParticle!=NULL) 04 memcpy(src.nxtParticle,&src,sizeof(PARTICLE)-4); 05 }
Now, it is crystal clear that the nxtParticle
field is a pointer to the next item in the singly linked list (see Listing 2). The total size for a given particle in memory is 144 bytes.
The propagateParticle
function copies only 140 out of these 144 bytes to the next item, so as not to modify the nxtParticle
pointer that would certainly crash the program. However, this implementation is far from secure. Suppose, for example, that you could alter the nxtParticle
field before
calling propagateParticle
. In this case, you would be able to write 140 bytes of arbitrary data at the address pointed to by nxtParticle
.
The particles that are going to be used in the program can be either initialized inside the code by executing some pseudo-random algorithm or loaded from disk. The major issue arises when loading a particle from disk and taking for granted that the nxtParticle
field will be NULL. Whenever this field is NULL, propagateParticle
does not copy the desired particle into the next particle. However, whenever this field is not NULL, it does (Listing 3).
The first thing that comes to mind is to craft a special particle containing an address inside the process address space of choice. A total of 140 bytes will be overwritten starting from that 4-byte address. It is important to notice that these bytes are your choice as well. Before going further, I need to look at the .got.plt
binary section and talk briefly about lazy linking
.
Buy this article as PDF
(incl. VAT)