Welcome to Intro. to Reverse Engineering!
Chris & Drake
Chris
- Interested in Offensive Cyber
- Sublime is best text editor
- This semester: Crypto, Net Sec, Pen Testing, and Research with Northrop Grumman.
Drake
- Graduating this Semester!
- Faithful to the Church of Emacs (Evil mode)
- Linux Enthusiast
- This semester: OS, Game Theory, Signals, Volatility, and this class!
Who are we?
Faculty
adviser
Dr. Jonathan Katz
Grading
Grades will only be homework and quizzes!
3 Quizzes 9% each
- Static Analysis
- Dynamic Analysis
- Applied RE
12 Homeworks 73% total
- See syllabus
Academic Integrity
See syllabus!!!
Don't cheat, don't copy, do your own work! Otherwise it's more work for all of us -___-
Ethics
Do not attempt to use what you learn in this class to commit illegal acts.
You will learn things in this course that you potentially can use to 'steal' intellectual property, and it is not our intent that you become criminals.
Use your best judgement, what you choose to do with this knowledge is on you.
In the United States even if an artifact or process is protected by trade secrets, reverse-engineering the artifact or process is often lawful as long as it has been legitimately obtained.
Reverse engineering of computer software in the US often falls under both contract law as a breach of contract as well as any other relevant laws. This is because most EULAs (end user license agreement) specifically prohibit it, and U.S. courts have ruled that if such terms are present, they override the copyright law which expressly permits it (see Bowers v. Baystate Technologies). Sec. 103(f) of the DMCA (17 U.S.C. § 1201 (f)) says that a person who is in legal possession of a program, is permitted to reverse-engineer and circumvent its protection if this is necessary in order to achieve "interoperability" — a term broadly covering other devices and programs being able to interact with it, make use of it, and to use and transfer data to and from it, in useful ways. A limited exemption exists that allows the knowledge thus gained to be shared and used for interoperability purposes.[35]
What is RE?
Reverse Enginnering
The process by which a man-made object is deconstructed to reveal its designs, architecture, or to extract knowledge from the object
Scope of the class
Linux Binary Analysis
- ELF
- Static/Dynamic
- Malware
- Legacy Code bases
Some Windows
- PE vs ELF
- Calling conventions
Class VM (Kali)
Link is provided on the class GitLab. We recommend you make a shared folder!
If using virtual-box, install the virtualbox-guest-x11
if using vmware, open-vm-tools is already installed
If you are not comfortable with Linux, please ask questions!
Installed
- Binaryninja 32-bit (demo)
- IDA 7 Free (64/32)
- Radare2 *
- Edb - GUI for gdb
- Ollydebug * (wine32 required)
-
Shellen (python3)
- Readelf
- Objdump
- Binwalk
Install scripts used are provided, for updating radare2 and python
$ pyenv install [-l] #to install or see available $ pyenv global [Version] #to set or see global version
What is C?
C
General-purpose, imperative, low-level, memory management, compiled
Control Statements
- If-statements
- While-loops
- Do-Whiles
- For-loops
- Switch-statements
if ( 1 && !0 ){
printf("Non-zero is true\n");
} else {
printf("!0 is 1\n");
}
while ("Is this non-zero?") {
printf("Who thinks yes?\n");
}
int i; /* C99 */
for (; i < 10; i++) {
print("What is wrong with this?\n");
}
/* What might this snipped be used for? */
switch(ast_node->type) {
case INT:
return atoi(ast_node->data);
case PLUS:
int a = process(ast_node->left);
int b = process(ast_node->right);
return a + b
default:
printf("There was an error\n");
exit(1); /* What does exit 1 mean? */
}
Pointers & Dynamic Memory
What are pointers used for in C?
- referencing memory on the stack or heap
- referencing arrays
- extra return values
What are pointers used for in Java?
- Objects, Objects, Objects
Stack and Heap
What is the Stack?
What is the Heap?
*if you know where to look
- Memory used to separate function frames for local memory usage
- Starts at a high address and grows down
- Dynamic memory that is globally accessible*
- Must be allocated & freed manually
C experts now?
/* 1. What is wrong here? */
int *foo(){
int i = 10;
return &i;
}
/* 2. Is this valid? */
int *foo() {
static int i = 5;
return &i;
}
/* 3. Is this valid? */
int *foo() {
void *yeet = malloc(sizeof(double)*10);
return (int *)yeet;
}
/* 4. Is this valid? */
int main() {
char *yeett = (char *)foo();
printf(yeett);
/* 5. Missing something? */
return 0;
}
Why do we care about details like these?
What does the stack look like?
int *foo(c,d) {
char e;
void *yeet = malloc(sizeof(c)*d);
/* Stop! */
return (int *)yeet;
}
int main(int argc, char *argv[]) {
int a = 5, 3;
int b = 7;
char *bar = foo((b,a),b);
return 0;
}
Bottom of Stack (High address) |
---|
argv |
argc |
init main stuff |
EBP
ESP
What does the stack look like?
int *foo(c,d) {
char e;
void *yeet = malloc(sizeof(c)*d);
/* Stop! */
return (int *)yeet;
}
int main(int argc, char *argv[]) {
int a = 5, 3;
int b = 7;
char *bar = foo((b,a),b);
return 0;
}
Bottom of Stack (High address) |
---|
argv |
argc |
init main stuff |
a = 5 |
b = 7 |
d = 7 |
c = 5 |
return address |
old ebp |
e = ? |
yeet |
main
foo
EBP
ESP
Take note of argc and argv!! *hint hint*...
What is Assembly?
Assembly
Low-level programming language that is translated into the the architecture's byte-code. Here we will use the x86_64 architecture.
What is x86_64?
64-bit architecture that supports 32-bit. Used by most modern computers.
Registers x86
-
eax - accumulator
-
ebx - base
-
ecx - counter
-
edx - data
-
edi - destination
-
esi - source
-
esp - stack
-
ebp - base stack frame pointer
- eip* - instruction pointer
- Flags - set from instructions
High speed memory used to store information temporarily
* not accessible like the other registers
The names do not matter for the use of the registers, but sometimes are hints to how they are used.
Registers x64
Same as x86 but now we have more and larger registers!
Heres the big picture, but we don't need all these!
Floating Point Registers
Flags
And a bunch of other stuff...
Sizes
-
rax - 64-bits, 8-bytes, quad-word (qword)
-
eax - 32-bits, 4-bytes, double-word (dword)
-
ax - 16-bits, 2-bytes, word
-
al/ah - 8-bits, 1-byte, byte
- eax is the lower 32-bits of rax
- ax is the lower 16-bits of eax and rax
- And so on
- This is true for ebx, ecx, edx, and the numbered registers as well.
- Not all registers have byte sized references, such as esp and ebp
Intel vs AT&T
We will focus on Intel syntax, but know that AT&T syntax exists.
Main difference is in the source and destination operand order
edi - destination esi - source
mov edi, esi
Intel
mov %esi, %edi
AT&T
In both examples, the contents of the esi register are copied to the edi register
Instruction types
If interested in disassembly then this diagram is useful to you!
Otherwise just know there are a lot ways to use instructions and each way gets encoded differently in bytecode!
mov & push/pop
mov eax, 0x01 ;put 1 into eax
mov [eax], 0x01 ;put 1 into the address in eax
mov eax, [esi] ;put contents of address (esi)
push eax ;put contents of eax on top of stack
push 0x01 ;put 1 on top of stack
; and inc the stack pointer
pop eax ;put contents top of the stack into eax,
; and dec the stack pointer
Displacement
[] indicates a access to memory*
[base + index*size + offset]
[arr + esi*4 + 0] ;array of int
*does not mean the memory is actually accessed
What could the offset be used for?
Branching
jmp addr ;addr could be a register
; with an address or a label
this_is_a_label:
call addr ; functions are just labels (addresses), with a calling convention
ret ; using the correct calling convention,
; ret returns from the called function
syscall ; more commonly seen as 'int' for interrupt
je addr ; or jz -- if zero flag is set
jg addr ; or ja -- if greater - signed or unsigned
jl addr ; or jb -- if less - signed or unsigned
jge addr ; -- if greater or equal to
jle addr ; -- if less or equal to
js addr ; -- if sign bit is set (if negative)
Conditional branching
Flags
carry -- used to indicate carry in arithmetic operation
zero -- if a value is zero or comparison equals 0
sign -- if negative
overflow -- if overflow occurred
Each flag is set from certain instructions
Resources to help when reversing x86_64
Week 0
By Drake P
Week 0
Welcome to RE, C, Assembly
- 203