Symbolic Execution

Symbolic Execution

We need to build up our constraints and come up with a desired path to take

  1. Statically learn about the binary. Finding desired output or code to execute, and forms of input.
  2. Decide what we need from the execution engine that would take too long for us. 

SAT Solvers

Once we have built up the constraints we need on the desired path, we can ask a SAT solver to figure out what the input should be.

SAT solving is an NP-Complete problem, so don't expect miracles! 

Symbolic Execution will path explore and build the constraints but we need SAT solving to finish the job for us!

Angr uses Z3 as its SAT solver,

which is a Microsoft research project.

Getting started with Angr

Then we need to open the project as a an Angr project:

import sys, angr, claripy
import IPython # I always include for debugging

First we need to import angr and others:

# Lots of options to use here, 
# but this is a decent default to use
p = angr.Project('tbt', use_sim_procedures=True)

The first parameter to a Project object is the binary path. I recommend keeping everything in the same directory

Getting started with Angr

If we wanted to use STDIN instead it would be:

state = p.factory.entry_state(args=
                    [p.filename, symbolic_input])

Now we need a state. entry_state will use the entry point address in the ELF header. We know tbt has input from the command-line, so we can set args to take extra input. I'll explain the symbolic_input in a minute

state = p.factory.entry_state(args=
              [p.filename], stdin=symbolic_input)

We might want to start at an address instead of the entry point:

state = p.factory.blank_state(addr=ADDR, 
                              stdin=symbolic_input)

Claripy

symbolic_input = claripy.BVS('sybmolic_input', 80)

state = p.factory.entry_state(args=
                    [p.filename, symbolic_input])

Now we have a state and any input the program expects at entry. One of which is the symbolic_input, this will be a bit vector symbol from the Claripy library.  

Claripy is the SAT solver wrapper. We want to solve for the input, so we create a bit vector symbol then give that to our entry state.

Claripy

for byte in symbolic_input.chop(8): #chop every 8 bits
    state.add_constraints(
            claripy.Or(
                claripy.And(byte >= b'0',
                            byte <= b'9'),
                i == 0)
            ) #(byte >= 0 AND byte <= 9) OR byte == 0

Now we have a state using the symbolic variable, we can add constraints to save time. If we know that our input is only in the ASCII range, then we can add that to our constraints

In the tbt program we know atoi is called so we can constrain the input to only digit ASCII for each byte in the BVS

Simulating and Exploring

sim = p.factory.simulation_manager(state)

We've built up the state and BVS, now we can create a simulation manager! The simulation manager handles path exploring, adding constraints, and managing states (active, dead, found).

With the simulation manager (SM), we want to tell it where we want it to go. Maybe we have an address in mind, or simply a desired output. Either way we will use the arguments to explore to indicate our wishes. 

# correct and incorrect could be an address or a function 
# that accepts a state, and returns a boolean
sim.explore(find=correct, avoid=incorrect)

Simulating and Exploring

Examples of checking the output of a program:

# Desired and undesired output
def correct(state):               # input some state
    stdout = state.posix.dumps(1) # get the output
    return b'Enjoy' in stdout     # T/F if contains
def incorrect(state):             # input some state
    stdout = state.posix.dumps(1) # get the output
    return b'invalid' in stdout   # T/F if contains

Remember that Unix machines use the POSIX standards, so this may change depending on the target OS

Simulating and Exploring

Finally we ask if there are any found states!

if len(sim.found) > 0:

    print("Found: {}".format(len(sim.found)))
    
    # Use the solver to return an input that takes us to 
    # the 'correct' condition 
    print(sim.found[0].solver.eval(key, cast_to=bytes))
    
    #IPython.embed() #maybe we want to check something else
else:
    print("None found")

IPython is nice because we can get a shell with all our variable still intact. 

Hooking functions

p.hook(0x4018B0, angr.SIM_PROCEDURES['glibc']['__libc_start_main']())
p.hook(0x422690, angr.SIM_PROCEDURES['libc']['memcpy']())
p.hook(0x408F10, angr.SIM_PROCEDURES['libc']['puts']())

Example text from whitehat_crypto400/solve.py

Statically linked binaries can be hooked to save time.

def hook_length(state):
    state.regs.rax = 8

p.hook(0x40168e, hook_length, length=5)
p.hook(0x4016BE, hook_length, length=5)

Hook with python functions for concrete behavior

Explore State Stashes

When exploring, every state that is created will be stashed. States can be moved from stash to stash. Only the active stash gets explored.

 

  • active
  • deadended
  • pruned -- LAZY_SOLVES
  • uncontrained
  • unsat
  • found
sm.explore(find=0x4016A3)
  .unstash(from_stash='found', to_stash='active')\

  sm.explore(find=0x4016B7, 
           avoid=[0x4017D6, 0x401699, 0x40167D])\
  .unstash(from_stash='found', to_stash='active')

sm.explore(find=0x4017CF, 
           avoid=[0x4017D6, 0x401699, 0x40167D])\
  .unstash(from_stash='found', to_stash='active')
  
sm.explore(find=0x401825, 
           avoid=[0x401811])

In conclusion

I showed you one way to use this library to solve a challenge. There are many ways, and states in a simulation manager can be moved around, In this way we can stop and go as many times as we want to avoid paths that slow or stop Angr.

 

There are many many other ways to use the library to your advantage!

Symbolic Execution

By Drake P

Symbolic Execution

  • 160