Please sit on the right half of the room
--->
What is the Stack?
What is the Heap?
*if you know where to look
eax - accumulator
ebx - base
ecx - counter
edx - data
edi - destination
esi - source
esp - stack pointer
ebp - base stack frame pointer
High speed memory used to store information temporarily
* not accessible like the other registers
The names do not matter for the use of the registers, but sometimes are hints to how they are used.
Same as x86 but now we have more and larger registers!
Heres the big picture, but we don't need all these!
Floating Point Registers
Flags
And a bunch of other stuff...
rax - 64-bits, 8-bytes, quad-word (qword)
eax - 32-bits, 4-bytes, double-word (dword)
ax - 16-bits, 2-bytes, word
al/ah - 8-bits, 1-byte, byte
- eax is the lower 32-bits of rax
- ax is the lower 16-bits of eax and rax
- And so on
- This is true for ebx, ecx, edx, and the numbered registers as well.
- Not all registers have byte sized references, such as esp and ebp
mov eax, 0x01 ;put 1 into eax
mov [eax], 0x01 ;put 1 into the address in eax
mov eax, [esi] ;put contents of address (esi)
push eax ;put contents of eax on top of stack
push 0x01 ;put 1 on top of stack
; and inc the stack pointer
pop eax ;put contents top of the stack into eax,
; and dec the stack pointer
[] indicates a access to memory
[base + index*size + offset]
; size can only be 1,2,4,8
[arr + esi*4 + 0] ;array of int
What could the offset be used for?
lea eax, ecx ;invalid
lea eax, [ecx] ;valid, equivalent to mov eax, ecx
lea eax, [ecx + edx] ;mov eax, ecx + edx*1 (implicit 1)
lea eax, [ecx + edx*3] ;invalid, valid numbers are 1,2,4,8
lea eax, [eax + edx*4] ;can be thought of as
; eax = (DWORD *)eax[edx] why?
lea does not access memory with the displacement operator! It only does the pointer arithmetic with no dereference!
jmp addr ;addr could be a register
; with an address or a label
this_is_a_label:
call addr ; functions are just labels (addresses), with a calling convention
ret ; using the correct calling convention,
; ret returns from the called function
syscall ; more commonly seen as 'int' for interrupt
je addr ; or jz -- if zero flag is set
jg addr ; or ja -- if greater - signed or unsigned
jl addr ; or jb -- if less - signed or unsigned
jge addr ; -- if greater or equal to
jle addr ; -- if less or equal to
js addr ; -- if sign bit is set (if negative)
carry -- used to indicate carry in arithmetic operation
zero -- if a value is zero or comparison equals 0
sign -- if negative
overflow -- if overflow occurred
Each flag is set from certain instructions
int *foo(int c, int d) {
char e;
void *yeet = malloc(sizeof(c)*d);
/* Stop! */
return (int *)yeet;
}
int main(int argc, char *argv[]) {
int a = 5;
int b = 7;
char *bar = foo(a,b);
return 0;
}
High
Low
argv
argc
ret addr
old base
5
7
7
junk
5
ret addr
old base
junk
junk
int *foo(c,d) {
char e;
void *yeet = malloc(sizeof(c)*d);
/* Stop! */
return (int *)yeet;
}
int main(int argc, char *argv[]) {
int a = 5;
int b = 7;
char *bar = foo(a,b);
return 0;
}
foo:
push ebp
mov ebp, esp
sub esp, 8 ;make room
mov ecx, [ebp + 4] ;get c
mov edx, [ebp + 8] ;get d
mov eax, 4 ;sizeof(int)
mul edx ;sizeof(int)*d
push eax ;arg to malloc
call malloc
add esp, 4 ;clean up arg
mov [esp], eax ;store in yeet
add esp, 8 ;clean up locals
pop ebp
ret
main:
push ebp
mov ebp, esp
push 5 ;a
push 7 ;b
sub esp, 4 ;bar
mov eax, [esp + 4] ;get b
mov ebx, [esp + 8] ;get a
push ebx ;d
push eax ;c
call foo
add esp, 8 ;clean up args
mov [esp], eax ;store in bar
add esp, 12 ;clean up locals
mov eax, 0 ;return 0
pop ebp
ret
Executable files are not just a sequence of assembly instructions. They must also contain metadata that tells the operating system how to run the file. The standard for specifying this information is an executable format.
ELF Header
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf32_Half e_type;
Elf32_Half e_machine;
Elf32_Word e_version;
Elf32_Addr e_entry;
Elf32_Off e_phoff;
Elf32_Off e_shoff;
Elf32_Word e_flags;
Elf32_Half e_ehsize;
Elf32_Half e_phentsize;
Elf32_Half e_phnum;
Elf32_Half e_shentsize;
Elf32_Half e_shnum;
Elf32_Half e_shtrndx;
} Elf32_Ehdr;
$ readelf -h a.out ###Output modified slightly
Magic: 7f 45 4c 46 \x7fELF
Class: ELF32
Data: little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048430
Start of program headers: 52
Start of section headers: 8588
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 9
Size of section headers: 40 (bytes)
Number of section headers: 35
Section header string table index: 34
e_ -- elf
ph -- program header
sh -- section header
off -- offset
ent -- entry
e_shentsize ?
e_shnum ?
e_phentsize ?
e_shtrndx ?*
Section Header Entry Size
Section Header Number (of entries)
Program Header Entry Size
Section Header String Table Index
TRY:
$ readelf -S /bin/bash
### modified output
[Nr] Name Type
[ 0] NULL
[ 1] .interp PROGBITS
[ 2] .note.ABI-tag NOTE
[ 3] .note.gnu.build-i NOTE
[ 4] .gnu.hash GNU_HASH
[ 5] .dynsym DYNSYM
[ 6] .dynstr STRTAB
[ 7] .gnu.version VERSYM
[ 8] .gnu.version_r VERNEED
[ 9] .rela.dyn RELA
[10] .rela.plt RELA
[11] .init PROGBITS
[12] .plt PROGBITS
[13] .plt.got PROGBITS
[14] .text PROGBITS
[15] .fini PROGBITS
[16] .rodata PROGBITS
[17] .eh_frame_hdr PROGBITS
[18] .eh_frame PROGBITS
[19] .init_array INIT_ARRAY
[20] .fini_array FINI_ARRAY
[21] .data.rel.ro PROGBITS
[22] .dynamic DYNAMIC
[23] .got PROGBITS
[24] .data PROGBITS
[25] .bss NOBITS
[26] .gnu_debuglink PROGBITS
[27] .shstrtab STRTAB
What is a section header?
What are some sections that are useful to us?
.text
.got
.data
A well defined header that gives information on a section of the binary which is unstructured.
Program headers indicates how segments required for execution are to be loaded into virtual memory.
There exists a Sections to Segment mapping that specifies which sections are part of which segments.
Most disassemblers recreated the does all analysis based on virtual addressing
How do multiple source files become a single executable?
ELF file formats:
ELF Header specifies the file format
+ Executable: specifies how to load the program into a process image (remember exec and forking?)
+ Relocatable: specifies how to include it's own code and data into an Executable or Shared object. Object files waiting to be included.
+ Shared Object: Dynamic library that links with an executable on load by a linker. Think printf, Libc, stdio.h
How do multiple source files become a single executable?
ELF file formats:
Linker links objects with shared libraries.
What does the whole pipeline look like then?
1. GCC compiles into ELF Relocatables
2. Static linker links Relocatables and attaches necessary information for Shared Object linking into an Executable
3. Loader execs the Executable, then the dynamic linker actually links to the Shared Objects for code execution.
All these platforms have their own conventions similar to ELF. There are more of these than can be easily communicated in a lecture or memorized, so you should get used to using Google.
$ man <tool name>
...
$ xxd <filename> $ file <filename>
$ strings <filename>
$ nm -D <filename>
$ readelf <filename>
$ objdump -d -M intel --disassemble=<name>