Aneesh Dogra
Palash Bansal
Reversing is a vital skill to have when you want to understand what a program is actually doing, and when you don't have access to its source code.
Execute
Assembly is a very low level language, so its somewhat different from the languages we all use everyday to write software.
If EAX = 0x12345678? What is the value of AX, AH, AL?
The FLAGS register is the status register in Intel x86 microprocessors that contains the current state of the processor.
int function()
{
return 1;
};
function:
mov eax, 1
ret
MACHINE CODE
We took an example of a simple function that returns 1 every time.
The function boils down to 2 lines of assembly and 6 bytes of opcode
function:
mov eax, 1
ret
The first line is a label. Which suggests that we are starting a block definition in assembly.
The second line is a mov instruction it moves 1 into the eax register.
The third line returns from the function. Similar to return in C.
#include <stdio.h>
int main()
{
printf("hello, world\n");
return 0;
}
Lets try to disassemble the program using GDB. GDB is a GNU Debugger that can transform machine code to readable assembly code very easily.
continued..
Techniques for attempting to convert the raw machine code of an executable file into equivalent code in assembly language and the high-level languages C and C++
(gdb) disassemble main
Dump of assembler code for function main:
0x000000000040052d <+0>: push rbp
0x000000000040052e <+1>: mov rbp,rsp
0x0000000000400531 <+4>: mov edi,0x4005d4
0x0000000000400536 <+9>: call 0x400410 <puts@plt>
0x000000000040053b <+14>: mov eax,0x0
0x0000000000400540 <+19>: pop rbp
0x0000000000400541 <+20>: ret
End of assembler dump.
Firstly, we push the old base pointer on the stack. Then we replace the base pointer with the current stack pointer.
We push some address to edi and call puts.
Then we reset the rbp and return.
Examining memory at 0x4005d4
Stack is a very crucial data structure in assembly its used to store local variables, pass variables to a function, store return addresses of a function and more.
The most frequently used stack access instructions are PUSH and POP
push ebp
mov ebp, esp
sub esp, X
The function definition usually begins with:
mov esp, ebp
pop ebp
ret 0
The function definition usually ends with:
Arguments to functions are passed via the stack.
Caller:
push arg3
push arg2
push arg1
call function
add esp, 12 ; 4*3=12
Callee:
Address | Argument |
---|---|
ESP | address |
ESP+4 | arg1 |
ESP+8 | arg2 |
ESP+0xC | arg3 |
#include <stdio.h>
int main()
{
printf("a=%d; b=%d; c=%d", 1, 2, 3);
return 0;
}
main proc near
var_10 = dword ptr -10h
var_C = dword ptr -0Ch
var_8 = dword ptr -8
var_4 = dword ptr -4
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
mov eax, offset aADBDCD ; "a=%d; b=%d; c=%d"
mov [esp+10h+var_4], 3
mov [esp+10h+var_8], 2
mov [esp+10h+var_C], 1
mov [esp+10h+var_10], eax
call _printf
mov eax, 0
leave
retn
main endp
Lets compile the C program in gcc and see what we get in IDA.
what does esp & 0xFFFFFFF0 do?
"x86 processors are designed to load code and data more quickly from even doubleword addresses."
#include <stdio.h>
int main()
{
int x;
printf ("Enter X:\n");
scanf ("%d", &x);
printf ("You entered %d...\n", x);
return 0;
};
main proc near
var_20 = dword ptr -20h
var_1C = dword ptr -1Ch
var_4 = dword ptr -4
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 20h
mov [esp+20h+var_20], offset aEnterX ; "Enter X:"
call _puts
mov eax, offset aD ; "%d"
lea edx, [esp+20h+var_4]
mov [esp+20h+var_1C], edx
mov [esp+20h+var_20], eax
call ___isoc99_scanf
mov edx, [esp+20h+var_4]
mov eax, offset aYouEnteredD___ ; "You entered %d...\n"
mov [esp+20h+var_1C], edx
mov [esp+20h+var_20], eax
call _printf
mov eax, 0
leave
retn
main endp
Lets compile the C program in gcc and see what we get in IDA.
LEA == &
We'll try to reverse engineer a very easy crackme, crackmes are programs written by people as challenges, to help learn reverse engineering.
They're usually authentication systems we need to break to get the success message by reverse engineering the code.
To start reverse engineering the file we must know what type it is. Easiest way to do this is using the file command on ubuntu.
>> file crackme
>> crackmecpp: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.26, BuildID[sha1]=901287c7af167a087acdd19e0bc0087c2a993481, not stripped
Let's open the file in GDB and get the disassembly.
>> gdb crackme
(gdb) > set disassembly-flavor intel
(gdb) > disassemble main
Usually main is a opening function of most programs, if main doesn't work we can try looking at the symbol table for functions.
0x0804876c <+0>: push ebp
0x0804876d <+1>: mov ebp,esp
0x0804876f <+3>: and esp,0xfffffff0; alignment
0x08048772 <+6>: sub esp,0x20; reserve 0x20 bytes for storage (local variables)
; function call 1
0x08048775 <+9>: mov DWORD PTR [esp+0x4],0x80488c0
0x0804877d <+17>: mov DWORD PTR [esp],0x8049be0
0x08048784 <+24>: call 0x8048640 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
; function call 2
0x08048789 <+29>: lea eax,[esp+0x1c]
0x0804878d <+33>: mov DWORD PTR [esp+0x4],eax
0x08048791 <+37>: mov DWORD PTR [esp],0x8049b40
0x08048798 <+44>: call 0x8048650 <_ZNSirsERi@plt>
0x0804879d <+49>: mov eax,DWORD PTR [esp+0x1c]
0x080487a1 <+53>: cmp eax,0x4d2
0x080487a6 <+58>: je 0x80487af <main()+67>
0x080487a8 <+60>: mov eax,0x63
0x080487ad <+65>: jmp 0x80487c8 <main()+92>
; function call 3
0x080487af <+67>: mov DWORD PTR [esp+0x4],0x80488cb
0x080487b7 <+75>: mov DWORD PTR [esp],0x8049be0
0x080487be <+82>: call 0x8048640 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
0x080487c3 <+87>: mov eax,0x1
0x08048775 <+9>: mov DWORD PTR [esp+0x4],0x80488c0
0x0804877d <+17>: mov DWORD PTR [esp],0x8049be0
0x08048784 <+24>: call 0x8048640 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
We could start by looking at the arguments passed to the function to understand what its starting to do.
(gdb) > x/1s 0x80488c0
0x80488c0: "Passcode: "
Looks like we're calling COUT here. Moving on!
0x08048789 <+29>: lea eax,[esp+0x1c]
0x0804878d <+33>: mov DWORD PTR [esp+0x4],eax
0x08048791 <+37>: mov DWORD PTR [esp],0x8049b40
0x08048798 <+44>: call 0x8048650 <_ZNSirsERi@plt>
0x0804879d <+49>: mov eax,DWORD PTR [esp+0x1c]
0x080487a1 <+53>: cmp eax,0x4d2
Here we're passing a pointer and an address to the function. After the function executes, we're comparing the pointer's value with 0x4d2 which is 1234 in decimal.
Looks like the function populates the pointer value. Which suggests its the CIN function.
We just found gold! We found, that the program compares the value we give it against 1234 and makes a decision about based on it. Let's try entering 1234 in the passcode field.
Reversing is not about understand what the whole program does, its about getting an idea, doing experiments and understanding the relevant parts.