Guide To Reverse Engineering
Learning Objectives
- Understand Computer's Memory Layout
- Buffer Overflow
- Code Injection
Memory Layout
All Programs are stored in memory
Text
Init'd Data
Uninit'd Data
0x00000000
0xFFFFFFFF
static int y = 1;
static int x;
Allocate in Compile Time
Cmd & Env
Stack
int f (){ int x = 1; }
Heap
malloc (sizeof(int))
Programs generated by compiler divided into segments:
- Code Segment or Text Segment
- Data Segment
- Initialized data segment
- Uninitialized data segment
- Heap
- Stack
Program Structure
Memory Allocation
Stack and Heap grow in the opposite direction
0x00000000
0xFFFFFFFF
Heap
Stack
Stack Pointer
Memory Allocation
Stack and Heap grow in the opposite direction
0x00000000
0xFFFFFFFF
Heap
Stack
Stack Pointer
push 1
push 2
push 3
1
2
3
Memory Allocation
Stack and Heap grow in the opposite direction
0x00000000
0xFFFFFFFF
Heap
Stack
Stack Pointer
push 1
push 2
push 3
return
1
2
3
Stack Allocation
0x00000000
0xFFFFFFFF
Caller's function
void func( char *arg1, int arg2, int arg3)
{
char loc1[2];
int loc2;
...
}
arg3
arg2
arg1
loc1
loc2
???
???
Local Variables pushed in the same order that they appear in the code
Arguments are pushed in the reverse order of the code
Accessing Variable
0x00000000
0xFFFFFFFF
Caller's function
void func( char *arg1, int arg2, int arg3)
{
...
loc2++;
...
}
arg3
arg2
arg1
loc1
loc2
???
???
Q: Where is the location of loc2 ?
loc2 fixed address cannot be known
We know the relative address of loc2 is always at the offset 8 bytes before the ???
Accessing Variable
0x00000000
0xFFFFFFFF
Caller's function
void func( char *arg1, int arg2, int arg3)
{
...
loc2++;
...
}
arg3
arg2
arg1
loc1
loc2
???
???
Q: Where is the location of loc2 ?
Stack frame of func
Accessing Variable
0x00000000
0xFFFFFFFF
Caller's function
void func( char *arg1, int arg2, int arg3)
{
...
loc2++;
...
}
arg3
arg2
arg1
loc1
loc2
???
???
Q: Where is the location of loc2 ?
A: -8(ebp)
Stack frame of func
Frame pointer
to locate local variables
Frame pointer
is stored in ebp register
Local variables are extracted by offset from ebp
Return from function
0x00000000
0xFFFFFFFF
Caller's function
void main()
{
...
func("Hey",10,-3)
...
}
arg3
arg2
arg1
loc1
loc2
???
???
%ebp
%ebp
Q: How can we restore ebp?
The return address will always be at %ebp+4
The first local variable will always be at %ebp-8
Calling Function
0x00000000
0xFFFFFFFF
Caller's function
void main()
{
...
func("Hey",10,-3)
...
}
arg3
arg2
arg1
???
%ebp
Q: How can we restore ebp?
%esp
%esp is the current stack pointer
Calling Function
0x00000000
0xFFFFFFFF
Caller's function
void main()
{
...
func("Hey",10,-3)
...
}
arg3
arg2
arg1
???
%ebp
Q: How can we restore ebp?
%esp
push %ebp on to stack as ebp
ebp
Set %ebp to %esp
Return from function
0x00000000
0xFFFFFFFF
Caller's function
void main()
{
...
func("Hey",10,-3)
...
}
arg3
arg2
arg1
loc1
loc2
ebp
eip
Stack frames
%ebp
%ebp
Q: How can we resume?
%eip is next instruction pointer
We push the next instruction on to the stack as eip
Stack and function call
Calling function:
- Push arguments in a reverse order
- Push returning address onto the stack (in eip)
- Jump to the function
Called function:
- Push the old frame pointer onto the stack
- Set the old frame pointer to current stack pointer esp
- Push local variables onto the stack
Returning Function:
- Reset the previous frame by setting esp = ebp, ebp=ebp
- Jump back to the return address: eip = 4 + ebp
#include<stdio.h>
#include<stdlib.h>
#pragma GCC optimize ("O0")
// uninitialized variable
int g1;
int g2;
//iniatialized variable
int g3=5;
int g4=7;
// function to test stack
void func2() {
int var1;
int var2;
printf("On Stack through func2:\t\t 0x%08x 0x%08x\n",&var1,&var2);
}
void func() {
int var1;
int var2;
printf("On Stack through func:\t\t 0x%08x 0x%08x\n",&var1,&var2);
func2();
}
int main(int argc, char* argv[], char* evnp[]) {
// Command line arguments
printf("Cmd Line and Env Var:\t\t 0x%08x 0x%08x 0x%08x\n",&argc,argv,evnp);
// Local variable will go to stack and stack should grow downward
int var1;
int var2;
printf("On Stack through main:\t\t 0x%08x 0x%08x\n",&var1,&var2);
func();
// Dynamic Memory should go to heap and should be increasing
void *arr1 = malloc(5);
void *arr2 = malloc(5);
printf("Heap Data:\t\t\t\t\t 0x%08x \n",arr2);
printf("Heap Data:\t\t\t\t\t 0x%08x \n",arr1);
free(arr1);
free(arr2);
// Uninitialized and iniatialized Global Variable
printf("Global Uniniatialized:\t\t 0x%08x 0x%08x\n",&g1,&g2);
printf("Global iniatialized:\t\t 0x%08x 0x%08x\n",&g3,&g4);
//Static Code must go to Text section
printf("Text Data:\t\t\t\t\t 0x%08x 0x%08x ",main,func);
return 0;
}
Viewing Your Memory Layout
Cmd Line and Env Var: argc:0x5fbff7a8 argv:0x5fbff7d0 evnp:0x5fbff7e0
On Stack through main: var1:0x5fbff794 var2:0x5fbff790
On Stack through func: var1:0x5fbff73c var2:0x5fbff738
On Stack through func2: var1:0x5fbff71c var2:0x5fbff718
Heap Data: arr2:0x00103b30
Heap Data: arr1:0x00103b20
Global Uniniatialized: g1:0x00001030 g2:0x00001034
Global iniatialized: g3:0x00001028 g4:0x0000102c
Text Data: main:0x00000c90 func:0x00000c60
Program ended with exit code: 0
And Here is Mine
General Map
Buffer Overflow
Buffer = Continuous memory associated with variables
Overflow = put more into the buffer than it can hold
Where does the overflowing data go ?
Benign outcome
void func(char *arg1)
{
char buffer[4];
strcpy(buffer,arg1);
}
int main()
{
char *mystr = "Authme!";
func(mystr);
}
&arg1
eip
ebp
buffer
00 00 00 00
Benign outcome
void func(char *arg1)
{
char buffer[4];
strcpy(buffer,arg1);
}
int main()
{
char *mystr = "Authme!";
func(mystr);
}
&arg1
eip
4d 65 21 00
buffer
A u t h
m e ! \0
ebp has changed !
Upon return,
ebp = 0x0021654d and eip = 0x00216551
Segmentation fault
Security-Relevant Outcome
void func(char *arg1)
{
int authentication = 0
char buffer[4];
strcpy(buffer,arg1);
if (authentication){
...
}
}
int main()
{
char *mystr = "Authme!";
func(mystr);
}
&arg1
eip
4d 65 21 00
buffer
A u t h
m e ! \0
authenticated
ebp
Overwriting Return Addr
void function(char *str) {
char buffer[16];
strcpy(buffer,str);
}
void main() {
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = 'A';
function(large_string);
}
&str
eip
buffer
41414141..
ebp
41414141..
41414141..4141.
Segmentation Violation
Overwriting Return Addr
void function(char *str) {
char buffer[16];
strcpy(buffer,str);
}
void main() {
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = 'A';
function(large_string);
}
&str
eip
buffer
41414141..
ebp
41414141..
41414141..4141.
Segmentation Violation
Overwriting Return Addr
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = buffer1 + 12;
(*ret) += 8;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
Dump of assembler code for function main:
0x8000490 <main>: pushl %ebp
0x8000491 <main+1>: movl %esp,%ebp
0x8000493 <main+3>: subl $0x4,%esp
0x8000496 <main+6>: movl $0x0,0xfffffffc(%ebp)
0x800049d <main+13>: pushl $0x3
0x800049f <main+15>: pushl $0x2
0x80004a1 <main+17>: pushl $0x1
0x80004a3 <main+19>: call 0x8000470 <function>
0x80004a8 <main+24>: addl $0xc,%esp
0x80004ab <main+27>: movl $0x1,0xfffffffc(%ebp)
0x80004b2 <main+34>: movl 0xfffffffc(%ebp),%eax
0x80004b5 <main+37>: pushl %eax
0x80004b6 <main+38>: pushl $0x80004f8
0x80004bb <main+43>: call 0x8000378 <printf>
0x80004c0 <main+48>: addl $0x8,%esp
0x80004c3 <main+51>: movl %ebp,%esp
0x80004c5 <main+53>: popl %ebp
0x80004c6 <main+54>: ret
0x80004c7 <main+55>: nop
Mission: 0x80004a8 + 10bytes = 0x80004b2
RET
Shellcode
Shellcode
We know how to modify the return address and the flow of execution -> what program do we want to execute? In most cases we'll simply want the program to spawn a shell -> command lines. But what if there is no such code in the program we are trying to exploit? How can we place arbitrary instruction into its address space? The answer = place the customized code in the buffer we are overflowing
Shellcode
&arg1
eip
ebp
buffer
SSSSSSSSSSSSSSSSSSSSSSSSS
0xD5
0xD5
S stands for the code to spawn shell
C code for Open Shell
#include <stdio.h>
void main() {
char *name[2];
name[0] = "/bin/sh";
name[1] = NULL;
execve(name[0], name, NULL);
}
Dump of assembler code for function main:
0x8000130 <main>: pushl %ebp
0x8000131 <main+1>: movl %esp,%ebp
0x8000133 <main+3>: subl $0x8,%esp
0x8000136 <main+6>: movl $0x80027b8,0xfffffff8(%ebp)
0x800013d <main+13>: movl $0x0,0xfffffffc(%ebp)
0x8000144 <main+20>: pushl $0x0
0x8000146 <main+22>: leal 0xfffffff8(%ebp),%eax
0x8000149 <main+25>: pushl %eax
0x800014a <main+26>: movl 0xfffffff8(%ebp),%eax
0x800014d <main+29>: pushl %eax
0x800014e <main+30>: call 0x80002bc <__execve>
0x8000153 <main+35>: addl $0xc,%esp
0x8000156 <main+38>: movl %ebp,%esp
0x8000158 <main+40>: popl %ebp
0x8000159 <main+41>: ret
End of assembler dump.
Inspect 32bit Assembly
0x8000130 <main>: pushl %ebp
0x8000131 <main+1>: movl %esp,%ebp
0x8000133 <main+3>: subl $0x8,%esp
Push current value of ebp onto the stack
Move current stack pointer value into the frame pointer.
Make space for the local variables by ebp-8. In this case its:
char *name[2];
Pointers are a word long, so it leaves space for two words (8 bytes).
Inspect 32bit Assembly
0x8000136 <main+6>: movl $0x80027b8,0xfffffff8(%ebp)
We copy the value 0x80027b8 (the address of the string "/bin/sh") into the first pointer of name[]. This is equivalent to:
name[0] = "/bin/sh";
0xFFFFFFF8 = -8 ( Hex to Signed int)
mov %eax,0xfffffffc(%ebp) = mov %eax,[ebp-8]
For further inspection, please continue at : http://insecure.org/stf/smashstack.html
Dump the hex
Disassembly of section .text: 0000000000400410 <_start>: 400410: 31 ed xor %ebp,%ebp 400412: 49 89 d1 mov %rdx,%r9 400415: 5e pop %rsi 400416: 48 89 e2 mov %rsp,%rdx 400419: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 40041d: 50 push %rax 40041e: 54 push %rsp 40041f: 49 c7 c0 90 05 40 00 mov $0x400590,%r8 400426: 48 c7 c1 20 05 40 00 mov $0x400520,%rcx 40042d: 48 c7 c7 06 05 40 00 mov $0x400506,%rdi 400434: e8 a7 ff ff ff callq 4003e0
Lala:~mac$ objdump -d shellcodeasm.c
Test the shell code
int main(void)
{
char shellcode[] = " Your shell code here "
(*(void (*)()) shellcode)();
return 0;
}
Compiling a C program
[Source Code] ---> Compiler ---> [Object code] --*
|
[Source Code] ---> Compiler ---> [Object code] --*--> Linker -->[Executable]--->Loader
| |
[Source Code] ---> Compiler ---> [Object code] --* |
| |
[Library file]--* V
[Running Executable in Memory]
Compiling a Program
Figure 1 - General Unix program compilation process
Figure 2 - C program compilation process
Compiler, Assembler, Linker & Loader
- Preprocessing : Instruct the compiler to do required pre-processing before actual compilation.
- Compilation : Compilation is a process in which a program written in one language get translates into another target language, If there are some errors, compiler will detect them and report errors.
- Assemble : Assemble code get translated into machine code. Machine code is often referred as object file.
- Linking: If these pieces of code need some other source file to be linked then linker links them to make it a executable file. Static Linking vs Dynamic Linking.
- Loader: It loads the executable code into memory. Program and data stack are created, register are initialized.
1. Pre-process & Compiling
C Preprocessor
Lexical Analyser
Syntax Analyser
Semantic Analyser
Pre Optimization
Code Generation
Post Optimize
2. Assembling
3. Linking
Assembler
Linker and Loader
Pre-processing & Compiling
-
C Preprocessing : Convert C sources into pure C codes.
- define statements
- include statements
- macro
- conditional statements
- Lexical Analyser: It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyser breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code. For example, in C language, the variable declaration is in the first line and its corresponding tokens is presented in the next line.
int value = 100;
int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).
Pre-processing & Compiling
- Syntax Analyser: This takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements are checked against the source code grammar, i.e. the parser checks if the expression made by the tokens is syntactically correct.
Pre-processing & Compiling
- Semantic Analyser: Semantic analysis checks whether the parse tree constructed follows the rules of language. For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
{
int i;
int *p;
p = i;
-----
-----
-----
}
The above code generates the error "Assignment of incompatible type".
Pre-processing & Compiling
- Semantic Analyser: Semantic analysis checks whether the parse tree constructed follows the rules of language. For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
{
int i;
int *p;
p = i;
-----
-----
-----
}
The above code generates the error "Assignment of incompatible type".
Guide to Reverse Engineering
By lala
Guide to Reverse Engineering
- 395