Week 2

Please sit on the right half of the room

--->

x86 Assembly Review

Stack and Heap

What is the Stack?

 

 

 

 

 

What is the Heap?

*if you know where to look

  • Memory used to separate function frames for local memory usage
  • Starts at a high address and grows down
  • Dynamic memory that is globally accessible*
  • Must be allocated & freed manually

Registers x86

  • eax - accumulator
  • ebx - base
  • ecx - counter
  • edx - data
  • edi - destination
  • esi - source
  • esp - stack pointer
  • ebp - base stack     
          frame pointer 
  • eip* - instruction pointer
  • Flags - set from instructions

High speed memory used to store information temporarily

* not accessible like the other registers

The names do not matter for the use of the registers, but sometimes are hints to how they are used.

Registers x64

Same as x86 but now we have more and larger registers! 

Heres the big picture, but we don't need all these!

Floating Point Registers

Flags

And a bunch of other stuff...

Sizes

  • rax   - 64-bits, 8-bytes, quad-word (qword) 
  • eax   - 32-bits, 4-bytes, double-word (dword)
  • ax    - 16-bits, 2-bytes, word   
  • al/ah - 8-bits,  1-byte,  byte

- eax is the lower 32-bits of rax

- ax is the lower 16-bits of eax and rax

- And so on

- This is true for ebx, ecx, edx, and the numbered registers as well. 

- Not all registers have byte sized references, such as esp and ebp

mov & push/pop

mov eax, 0x01        ;put 1 into eax
mov [eax], 0x01      ;put 1 into the address in eax
mov eax, [esi]       ;put contents of address (esi)

push eax             ;put contents of eax on top of stack
push 0x01            ;put 1 on top of stack
                     ; and inc the stack pointer

pop eax              ;put contents top of the stack into eax,
                     ; and dec the stack pointer 

Displacement

[] indicates a access to memory

[base + index*size + offset]
; size can only be 1,2,4,8

[arr + esi*4 + 0]     ;array of int

What could the offset be used for?

lea

lea eax, ecx   ;invalid
lea eax, [ecx] ;valid, equivalent to mov eax, ecx

lea eax, [ecx + edx]   ;mov eax, ecx + edx*1 (implicit 1)
lea eax, [ecx + edx*3] ;invalid, valid numbers are 1,2,4,8

lea eax, [eax + edx*4] ;can be thought of as 
                       ; eax = (DWORD *)eax[edx] why?

Displacement

lea does not access memory with the displacement operator! It only does the pointer arithmetic with no dereference! 

Branching

jmp addr     ;addr could be a register
             ; with an address or a label

this_is_a_label:

call addr    ; functions are just labels (addresses), with a calling convention
ret          ; using the correct calling convention, 
             ;  ret returns from the called function
syscall      ; more commonly seen as 'int' for interrupt
je addr  ; or jz  -- if zero flag is set
jg addr  ; or ja  -- if greater - signed or unsigned 
jl addr  ; or jb  -- if less    - signed or unsigned
jge addr ;        -- if greater or equal to
jle addr ;        -- if less or equal to
js addr  ;        -- if sign bit is set (if negative)

Conditional branching

Flags

carry    -- used to indicate carry in arithmetic operation                    
zero     -- if a value is zero or comparison equals 0
sign     -- if negative
overflow -- if overflow occurred

Each flag is set from certain instructions

What does the stack look like?

int *foo(int c, int d) {
    char e;
    void *yeet = malloc(sizeof(c)*d);
    /* Stop! */
    return (int *)yeet;
}

int main(int argc, char *argv[]) {
    int a = 5;
    int b = 7;
    char *bar = foo(a,b);
    
    return 0;
}

High

Low

argv

argc

ret addr

old base

5

7

7

junk

5

ret addr

old base

junk

junk

int *foo(c,d) {
    char e;
    void *yeet = malloc(sizeof(c)*d);
    /* Stop! */
    return (int *)yeet;
}

int main(int argc, char *argv[]) {
    int a = 5;
    int b = 7;
    char *bar = foo(a,b);
    
    return 0;
}
foo:
    push ebp
    mov ebp, esp

    sub esp, 8            ;make room
    mov ecx, [ebp + 4]    ;get c
    mov edx, [ebp + 8]    ;get d
    
    mov eax, 4     ;sizeof(int)    
    mul edx        ;sizeof(int)*d
    
    push eax       ;arg to malloc
    call malloc    
    add esp, 4     ;clean up arg
    mov [esp], eax ;store in yeet
    
    add esp, 8     ;clean up locals
    pop ebp
    ret

main:
    push ebp
    mov ebp, esp
    
    push 5       ;a
    push 7       ;b
    sub esp, 4   ;bar
    
    mov eax, [esp + 4]      ;get b
    mov ebx, [esp + 8]      ;get a

    push ebx         ;d
    push eax         ;c
    call foo
    add esp, 8       ;clean up args  
    mov [esp], eax   ;store in bar

    add esp, 12      ;clean up locals
    mov eax, 0       ;return 0
    pop ebp
    ret

Lab 2

Static Analysis

(Triage)

Executable Formats

Executable files are not just a sequence of assembly instructions. They must also contain metadata that tells the operating system how to run the file. The standard for specifying this information is an executable format.

ELF Header

#define EI_NIDENT 16

typedef struct {
        unsigned char e_ident[EI_NIDENT];
        Elf32_Half    e_type;
        Elf32_Half    e_machine;
        Elf32_Word    e_version;
        Elf32_Addr    e_entry;
        Elf32_Off     e_phoff;
        Elf32_Off     e_shoff;
        Elf32_Word    e_flags;
        Elf32_Half    e_ehsize;
        Elf32_Half    e_phentsize;
        Elf32_Half    e_phnum;
        Elf32_Half    e_shentsize;
        Elf32_Half    e_shnum;
        Elf32_Half    e_shtrndx;
} Elf32_Ehdr;
$ readelf -h a.out       ###Output modified slightly  
  Magic:   7f 45 4c 46               \x7fELF
  Class:                             ELF32
  Data:                              little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8048430
  Start of program headers:          52 
  Start of section headers:          8588 
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         9
  Size of section headers:           40 (bytes)
  Number of section headers:         35
  Section header string table index: 34

e_ -- elf

ph -- program header

sh -- section header

off -- offset

ent -- entry

e_shentsize ?

e_shnum ?

e_phentsize ?

e_shtrndx ?*

Section Header Entry Size

Section Header Number (of entries)

Program Header Entry Size

Section Header String Table Index

Section Header

TRY: 

$ readelf -S /bin/bash

### modified output 
  [Nr] Name              Type      
  [ 0]                   NULL      
  [ 1] .interp           PROGBITS  
  [ 2] .note.ABI-tag     NOTE      
  [ 3] .note.gnu.build-i NOTE      
  [ 4] .gnu.hash         GNU_HASH  
  [ 5] .dynsym           DYNSYM    
  [ 6] .dynstr           STRTAB    
  [ 7] .gnu.version      VERSYM    
  [ 8] .gnu.version_r    VERNEED   
  [ 9] .rela.dyn         RELA      
  [10] .rela.plt         RELA      
  [11] .init             PROGBITS  
  [12] .plt              PROGBITS  
  [13] .plt.got          PROGBITS  
  [14] .text             PROGBITS  
  [15] .fini             PROGBITS  
  [16] .rodata           PROGBITS  
  [17] .eh_frame_hdr     PROGBITS  
  [18] .eh_frame         PROGBITS  
  [19] .init_array       INIT_ARRAY
  [20] .fini_array       FINI_ARRAY
  [21] .data.rel.ro      PROGBITS  
  [22] .dynamic          DYNAMIC   
  [23] .got              PROGBITS  
  [24] .data             PROGBITS  
  [25] .bss              NOBITS    
  [26] .gnu_debuglink    PROGBITS  
  [27] .shstrtab         STRTAB 

What is a section header?

 

 

 

 

What are some sections that are useful to us?

 

    .text

    .got

    .data

   A well defined header that gives information on a section of the binary which is unstructured.

Common Sections

  • .text - Executable code or instructions
  • .data - Global or static RW variables
  • .bss - Uninitialized global or static variables
  • .rodata - Constants and read-only variables
  • .symtab - Symbol table
  • .strtab - Names of symbols in the symbol table
  • .got - Holds addresses of external vars and functions
  • .plt - Use to call into shared libraries
  • .init/.fini - Global constructors and destructors

Program Header

Program headers indicates how segments required for execution are to be loaded into virtual memory.

 

There exists a Sections to Segment mapping that specifies which sections are part of which segments. 

 

Most disassemblers recreated the  does all analysis based on virtual addressing

How do multiple source files become a single executable?

ELF file formats:

  • Executable file
  • Shared Object file
  • Relocatable file
  • and some others

ELF Header specifies the file format

 + Executable: specifies how to load the program into a process image (remember exec and forking?)

 + Relocatable: specifies how to include it's own code and data into an Executable or Shared object. Object files waiting to be included.

 + Shared Object: Dynamic library that links with an executable on load by a linker. Think printf, Libc, stdio.h 

How do multiple source files become a single executable?

ELF file formats:

  • Executable file
  • Shared Object file
  • Relocatable file

Linker links objects with shared libraries.

What does the whole pipeline look like then?

1. GCC compiles into ELF Relocatables

 

2. Static linker links Relocatables and attaches necessary information for Shared Object linking into an Executable

 

3. Loader execs the Executable, then the dynamic linker actually links to the Shared Objects for code execution. 

Other Formats

  • Portable Executable (PE): Windows
  • Mach Object (Mach-O): MacOS
  • WASM (Web): Browser


All these platforms have their own conventions similar to ELF. There are more of these than can be easily communicated in a lecture or memorized, so you should get used to using Google.


Demonstration

$ man <tool name>
...
$ xxd <filename>
$ file <filename>
$ strings <filename>
$ nm -D <filename>
$ readelf <filename>
$ objdump -d -M intel --disassemble=<name>

Lab 1

Feedback