A Deeper Dive
Readelf, objdump, file, strings, all great tools, but they aren't Analysis tools!
They help us analyze, but we want cool analysis done for us!
Disassemblers are only as good as you make them!
How might we detect functions in a binary? Assuming the binary is stripped
Recursively, we can try to detect all the functions by disassembling at a call instruction address
Can this be defeated?
Linearly, we can look for function initialization code, such as the snipped below.
Can this be defeated?
foo:
push ebp
mov ebp, esp
...
do stuff
...
pop ebp
ret
Initialize the stack frame (start of function)
remove stack frame (end of function)
What are the possible control flow structures?
In Assembly, do these look different?
What are the possible control flow structures?
+ For, While, Do While loops
+ If, If-else statements
+ Switch statements
In Assembly, do these look different?
+ in assembly they do not
look very different at all
If all loops look the same, why should we care if they are different?
How might we detect a loop then?
If all loops look the same, why should we care if they are different?
Well we don't care actually.
How might we detect a loop then?
Perform DFS
1: top
2: top, next
3: top,
4: top, top *loop found*
Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program
-wikipedia
+ Forward/Backward Analysis
+ Flow/Path/Context Sensitive
+ May/Must join points
There is a lot to talk about here, but too much for our class! For more check these 430 slides out!
after slicing
Program slicing is a great example of an analysis that would be useful to a Reverse Engineer!
This shows us on what affects sum prior to the chosen line (write(sum)).
(Backward analysis)
Notice lines with 'w' are still included since w affects the definition of sum in the for loop.
Program slicing is not exclusively backward like other data-flow analyses
Angr - python library for analysis and powerful for symbolic execution (later topic)
CIL - written in OCaml, for C, can do all the analysis mentions before on C source code.
LLVM - frame work for compiling and optimizing the LLVM IR, easily extended
The term "libc" is commonly used as a shorthand for the "standard C library", a library of standard functions that can be used by all C programs (and sometimes by programs in other languages).
-wikipedia
What is happening when we use printf in our binaries?
What is happening when we use printf in our binaries?
How does text make it to the screen?
printf gets linked to the first instance in the
included libraries of a printf, then printf does its
thing.
How does text make it to the screen?
printf, malloc, read, write, etc. are all wrappers for
system calls.
System calls are the process' way of asking for
permission to do something with a resource.
Syscalls are not standardized on all architectures or Kernels
In Linux, they are interrupts ('int' 32bit or 'syscall' 64bit).
In Windows, depending on the version or architecture, you might see 'syscall' or 'int' or even just 'call'
What are these 'resources'?
Resources are anything the computer can do, reaching devices, printing to terminals, key presses, etc.
In windows, resources are called handles, and everything is an object. More wrappers for syscalls.
Simple in assembly:
; ----------------------------------------------------------------------------------------
; Writes "Hello, World" to the console using only system calls. Runs on 64-bit Linux only.
; To assemble and run:
;
; nasm -felf64 hello.asm && ld hello.o && ./a.out
; ----------------------------------------------------------------------------------------
global _start
section .text
_start: mov rax, 1 ; system call for write
mov rdi, 1 ; file handle 1 is stdout
mov rsi, message ; address of string to output
mov rdx, 13 ; number of bytes
syscall ; invoke operating system to do the write
mov rax, 60 ; system call for exit
xor rdi, rdi ; exit code 0
syscall ; invoke operating system to exit
section .data
message: db "Hello, World", 10 ; note the newline at the end
1. Linker sets up the Global Offset table in memory (.got)
2. When the function is called, we use an offset plus the .got address to call the correct function in the .plt (process linkage table).
3. From the linked function, we jump into the shared object to execute.
What does this all mean for attackers?