Guide To Reverse Engineering

Learning Objectives

  • Understand Computer's Memory Layout
  • Buffer Overflow
  • Code Injection

Memory Layout

All Programs are stored in memory

Text

Init'd Data

Uninit'd Data

 0x00000000
 0xFFFFFFFF
static int y = 1;
static int x;

Allocate in Compile Time

Cmd & Env

Stack

int f (){ int x = 1; }

Heap

malloc (sizeof(int))

Programs generated by compiler divided into segments:

  • Code Segment or Text Segment
  • Data Segment
    • Initialized data segment
    • Uninitialized data segment
  • Heap
  • Stack

Program Structure

Memory Allocation

Stack and Heap grow in the opposite direction
0x00000000
0xFFFFFFFF

Heap

Stack

Stack Pointer

Memory Allocation

Stack and Heap grow in the opposite direction
0x00000000
0xFFFFFFFF

Heap

Stack

Stack Pointer

push 1

push 2

push 3

1

2

3

Memory Allocation

Stack and Heap grow in the opposite direction
0x00000000
0xFFFFFFFF

Heap

Stack

Stack Pointer

push 1

push 2

push 3

return

1

2

3

Stack Allocation

0x00000000
0xFFFFFFFF

Caller's function

void func( char *arg1, int arg2, int arg3)
{
  char loc1[2];
  int loc2;
  ...
}

arg3

arg2

arg1

loc1

loc2

???

???

Local Variables pushed in the same order that they appear in the code

Arguments are pushed in the reverse order of the code

Accessing Variable

0x00000000
0xFFFFFFFF

Caller's function

void func( char *arg1, int arg2, int arg3)
{
  ...
  loc2++;
  ...
}

arg3

arg2

arg1

loc1

loc2

???

???

Q: Where is the location of loc2 ?

loc2 fixed address cannot be known

We know the relative address of loc2 is always at the offset 8 bytes before the ???

Accessing Variable

0x00000000
0xFFFFFFFF

Caller's function

void func( char *arg1, int arg2, int arg3)
{
  ...
  loc2++;
  ...
}

arg3

arg2

arg1

loc1

loc2

???

???

Q: Where is the location of loc2 ?

Stack frame of func

Accessing Variable

0x00000000
0xFFFFFFFF

Caller's function

void func( char *arg1, int arg2, int arg3)
{
  ...
  loc2++;
  ...
}

arg3

arg2

arg1

loc1

loc2

???

???

Q: Where is the location of loc2 ?

A: -8(ebp)

Stack frame of func
Frame pointer 
to locate local variables
Frame pointer 
is stored in ebp register
Local variables are extracted by offset from ebp

Return from function

0x00000000
0xFFFFFFFF

Caller's function

void main()
{
  ...
  func("Hey",10,-3)
  ...
}

arg3

arg2

arg1

loc1

loc2

???

???

%ebp
%ebp

Q: How can we restore ebp?

 

The return address will always be at %ebp+4
The first local variable will always be at %ebp-8

Calling Function

0x00000000
0xFFFFFFFF

Caller's function

void main()
{
  ...
  func("Hey",10,-3)
  ...
}

arg3

arg2

arg1

???

%ebp

Q: How can we restore ebp?

 

%esp

%esp is the current stack pointer

Calling Function

0x00000000
0xFFFFFFFF

Caller's function

void main()
{
  ...
  func("Hey",10,-3)
  ...
}

arg3

arg2

arg1

???

%ebp

Q: How can we restore ebp?

 

%esp 

push %ebp on to stack as ebp

ebp

Set %ebp to %esp

Return from function

0x00000000
0xFFFFFFFF

Caller's function

void main()
{
  ...
  func("Hey",10,-3)
  ...
}

arg3

arg2

arg1

loc1

loc2

ebp

eip

Stack frames
%ebp
%ebp

Q: How can we resume?

 

%eip is next instruction pointer

We push the next instruction on to the stack as eip

 

Stack and function call

Calling function:

  • Push arguments in a reverse order
  • Push returning address onto the stack (in eip)
  • Jump to the function

Called function:

  • Push the old frame pointer onto the stack
  • Set the old frame pointer to current stack pointer esp
  • Push local variables onto the stack

 

Returning Function:

  • Reset the previous frame by setting esp = ebp, ebp=ebp
  • Jump back to the return address: eip = 4 + ebp
#include<stdio.h>
#include<stdlib.h>
#pragma GCC optimize ("O0")

// uninitialized variable
int g1;
int g2;

//iniatialized variable
int g3=5;
int g4=7;

// function to test stack
void func2() {
    int var1;
    int var2;
    printf("On Stack through func2:\t\t 0x%08x 0x%08x\n",&var1,&var2);
}

void func() {
    int var1;
    int var2;
    printf("On Stack through func:\t\t 0x%08x 0x%08x\n",&var1,&var2);
    func2();
}

int main(int argc, char* argv[], char* evnp[]) {
    
    // Command line arguments
    printf("Cmd Line and Env Var:\t\t 0x%08x 0x%08x 0x%08x\n",&argc,argv,evnp);
    
    // Local variable will go to stack and stack should grow downward
    int var1;
    int var2;
    printf("On Stack through main:\t\t 0x%08x 0x%08x\n",&var1,&var2);
    func();
    
    // Dynamic Memory should go to heap and should be increasing
    void *arr1 = malloc(5);
    void *arr2 = malloc(5);
    printf("Heap Data:\t\t\t\t\t 0x%08x \n",arr2);
    printf("Heap Data:\t\t\t\t\t 0x%08x \n",arr1);
    free(arr1);
    free(arr2);
    
    // Uninitialized and iniatialized Global Variable
    printf("Global Uniniatialized:\t\t 0x%08x 0x%08x\n",&g1,&g2);
    printf("Global iniatialized:\t\t 0x%08x 0x%08x\n",&g3,&g4);
    
    //Static Code must go to Text section
    printf("Text Data:\t\t\t\t\t 0x%08x 0x%08x ",main,func);
    return 0;
}

Viewing Your Memory Layout

Cmd Line and Env Var:		 argc:0x5fbff7a8 argv:0x5fbff7d0 evnp:0x5fbff7e0
On Stack through main:		 var1:0x5fbff794 var2:0x5fbff790
On Stack through func:		 var1:0x5fbff73c var2:0x5fbff738
On Stack through func2:		 var1:0x5fbff71c var2:0x5fbff718
Heap Data:			 arr2:0x00103b30 
Heap Data:			 arr1:0x00103b20 
Global Uniniatialized:		 g1:0x00001030 g2:0x00001034
Global iniatialized:		 g3:0x00001028 g4:0x0000102c
Text Data:			 main:0x00000c90 func:0x00000c60 
Program ended with exit code: 0

And Here is Mine

General Map

Buffer Overflow

Buffer = Continuous memory associated with variables 

 

Overflow = put more into the buffer than it can hold

 

Where does the overflowing data go ?

Benign outcome

void func(char *arg1)
{
   char buffer[4];
   strcpy(buffer,arg1);
}

int main()
{
   char *mystr = "Authme!";
   func(mystr);
}

&arg1

eip

ebp

buffer

00 00 00 00

Benign outcome

void func(char *arg1)
{
   char buffer[4];
   strcpy(buffer,arg1);
}

int main()
{
   char *mystr = "Authme!";
   func(mystr);
}

&arg1

eip

4d 65 21 00

buffer

A u t h
m e ! \0

ebp has changed !

Upon return,

ebp = 0x0021654d and eip = 0x00216551

Segmentation fault

Security-Relevant Outcome

void func(char *arg1)
{
   int authentication = 0
   char buffer[4];
   strcpy(buffer,arg1);
   if (authentication){
      ...
   }
}

int main()
{
   char *mystr = "Authme!";
   func(mystr);
}

&arg1

eip

4d 65 21 00

buffer

A u t h
m e ! \0

authenticated

ebp

Overwriting Return Addr

void function(char *str) {
   char buffer[16];

   strcpy(buffer,str);
}

void main() {
  char large_string[256];
  int i;

  for( i = 0; i < 255; i++)
    large_string[i] = 'A';

  function(large_string);
}

&str

eip

buffer

41414141..

ebp

41414141..
41414141..4141.
Segmentation Violation

Overwriting Return Addr

void function(char *str) {
   char buffer[16];

   strcpy(buffer,str);
}

void main() {
  char large_string[256];
  int i;

  for( i = 0; i < 255; i++)
    large_string[i] = 'A';

  function(large_string);
}

&str

eip

buffer

41414141..

ebp

41414141..
41414141..4141.
Segmentation Violation

Overwriting Return Addr

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
   int *ret;

   ret = buffer1 + 12;
   (*ret) += 8;
}

void main() {
  int x;

  x = 0;
  function(1,2,3);
  x = 1;
  printf("%d\n",x);
}
Dump of assembler code for function main:
0x8000490 <main>:       pushl  %ebp
0x8000491 <main+1>:     movl   %esp,%ebp
0x8000493 <main+3>:     subl   $0x4,%esp
0x8000496 <main+6>:     movl   $0x0,0xfffffffc(%ebp)
0x800049d <main+13>:    pushl  $0x3
0x800049f <main+15>:    pushl  $0x2
0x80004a1 <main+17>:    pushl  $0x1
0x80004a3 <main+19>:    call   0x8000470 <function>
0x80004a8 <main+24>:    addl   $0xc,%esp
0x80004ab <main+27>:    movl   $0x1,0xfffffffc(%ebp)
0x80004b2 <main+34>:    movl   0xfffffffc(%ebp),%eax
0x80004b5 <main+37>:    pushl  %eax
0x80004b6 <main+38>:    pushl  $0x80004f8
0x80004bb <main+43>:    call   0x8000378 <printf>
0x80004c0 <main+48>:    addl   $0x8,%esp
0x80004c3 <main+51>:    movl   %ebp,%esp
0x80004c5 <main+53>:    popl   %ebp
0x80004c6 <main+54>:    ret
0x80004c7 <main+55>:    nop
Mission: 0x80004a8 + 10bytes = 0x80004b2

RET

Shellcode

Shellcode

We know how to modify the return address and the flow of execution -> what program do we want to execute?  

In most cases we'll simply want the program to spawn a shell -> command lines. 

But what if there is no such code in the program we are trying to exploit?  How can we place arbitrary instruction into its address space?  

The answer = place the customized code in the buffer we are overflowing

Shellcode

&arg1

eip

ebp

buffer

SSSSSSSSSSSSSSSSSSSSSSSSS

0xD5

0xD5

S stands for the code to spawn shell

C code for Open Shell

#include <stdio.h>

void main() {
   char *name[2];

   name[0] = "/bin/sh";
   name[1] = NULL;
   execve(name[0], name, NULL);
}
Dump of assembler code for function main:
0x8000130 <main>:       pushl  %ebp
0x8000131 <main+1>:     movl   %esp,%ebp
0x8000133 <main+3>:     subl   $0x8,%esp
0x8000136 <main+6>:     movl   $0x80027b8,0xfffffff8(%ebp)
0x800013d <main+13>:    movl   $0x0,0xfffffffc(%ebp)
0x8000144 <main+20>:    pushl  $0x0
0x8000146 <main+22>:    leal   0xfffffff8(%ebp),%eax
0x8000149 <main+25>:    pushl  %eax
0x800014a <main+26>:    movl   0xfffffff8(%ebp),%eax
0x800014d <main+29>:    pushl  %eax
0x800014e <main+30>:    call   0x80002bc <__execve>
0x8000153 <main+35>:    addl   $0xc,%esp
0x8000156 <main+38>:    movl   %ebp,%esp
0x8000158 <main+40>:    popl   %ebp
0x8000159 <main+41>:    ret
End of assembler dump.

Inspect 32bit Assembly

0x8000130 <main>:       pushl  %ebp
0x8000131 <main+1>:     movl   %esp,%ebp
0x8000133 <main+3>:     subl   $0x8,%esp

Push current value of ebp onto the stack
Move current stack pointer value into the frame pointer. 
Make space for the local variables by ebp-8. In this case its:

	char *name[2];

Pointers are a word long, so it leaves space for two words (8 bytes).

Inspect 32bit Assembly

0x8000136 <main+6>:     movl   $0x80027b8,0xfffffff8(%ebp)

We copy the value 0x80027b8 (the address of the string "/bin/sh") into the first pointer of name[]. This is equivalent to:

	name[0] = "/bin/sh";
 0xFFFFFFF8 = -8 ( Hex to Signed int)
mov %eax,0xfffffffc(%ebp) = mov %eax,[ebp-8]

 

 

For further inspection, please continue at : http://insecure.org/stf/smashstack.html

Dump the hex

Disassembly of section .text:
0000000000400410 <_start>:
  400410:    31 ed                    xor    %ebp,%ebp
  400412:    49 89 d1                 mov    %rdx,%r9
  400415:    5e                       pop    %rsi
  400416:    48 89 e2                 mov    %rsp,%rdx
  400419:    48 83 e4 f0              and    $0xfffffffffffffff0,%rsp
  40041d:    50                       push   %rax
  40041e:    54                       push   %rsp
  40041f:    49 c7 c0 90 05 40 00     mov    $0x400590,%r8
  400426:    48 c7 c1 20 05 40 00     mov    $0x400520,%rcx
  40042d:    48 c7 c7 06 05 40 00     mov    $0x400506,%rdi
  400434:    e8 a7 ff ff ff           callq  4003e0

Lala:~mac$ objdump -d shellcodeasm.c

Test the shell code

int main(void)

{

    char shellcode[] = " Your shell code here "


    (*(void (*)()) shellcode)();

     

    return 0;

}

Compiling a C program

[Source Code] ---> Compiler ---> [Object code] --*
                                                 |
[Source Code] ---> Compiler ---> [Object code] --*--> Linker -->[Executable]--->Loader 
                                                 |                                 |
[Source Code] ---> Compiler ---> [Object code] --*                                 |
                                                 |                                 |
                                 [Library file]--*                                 V
                                                                  [Running Executable in Memory]

Compiling a Program

Figure 1 - General Unix program compilation process
Figure 2 - C program compilation process

Compiler, Assembler, Linker & Loader

  • Preprocessing : Instruct the compiler to do required pre-processing before actual compilation.
  • Compilation : Compilation is a process in which a program written in one language get translates into another target language, If there are some errors, compiler will detect them and report errors.
  • Assemble : Assemble code get translated into machine code. Machine code is often referred as object file.
  • Linking: If these pieces of code need some other source file to be linked then linker links them to make it a executable file. Static Linking vs Dynamic Linking.
  • Loader: It loads the executable code into memory. Program and data stack are created, register are initialized.

1. Pre-process & Compiling

C Preprocessor

Lexical Analyser

Syntax Analyser

Semantic Analyser

Pre Optimization

Code Generation

Post Optimize

2. Assembling

3. Linking

Assembler

Linker and Loader

Pre-processing & Compiling

  • C Preprocessing : Convert C sources into pure C codes.
    • define statements
    • include statements
    • macro
    • conditional statements
  • Lexical Analyser: It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyser breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code. For example, in C language, the variable declaration is in the first line and its corresponding tokens is presented in the next line.
int value = 100;
int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

 

Pre-processing & Compiling

  • Syntax Analyser: This takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements are checked against the source code grammar, i.e. the parser checks if the expression made by the tokens is syntactically correct.

 

Pre-processing & Compiling

  • Semantic Analyser: Semantic analysis checks whether the parse tree constructed follows the rules of language. For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
{
    int i;
    int *p;

    p = i;
    -----
    -----
    -----
}
The above code generates the error "Assignment of incompatible type".

Pre-processing & Compiling

  • Semantic Analyser: Semantic analysis checks whether the parse tree constructed follows the rules of language. For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
{
    int i;
    int *p;

    p = i;
    -----
    -----
    -----
}
The above code generates the error "Assignment of incompatible type".

Guide to Reverse Engineering

By lala

Guide to Reverse Engineering

  • 395