Undefined Behavior

A K. Song Production

What does a compiler do?

int dumb()
{
    int a,b,c;
    a = 5;
    b = 7;
    c = a + b;
    return c;
}
dumb: 
addiu $s0, $0, 5    #a = 5
addiu $s1, $1, 7    #b = 7
addu  $t0, $s0, $s1  #c = a + b
addiu $v0, $t0, 0    #mv to ret
jr $ra               #return
dumb: 
addiu $v0, $0, 12
jr $ra #return 12

More "accurate"

Doesn't suck

What does a compiler do?

int x = 10;
x = x++ * ++x;

Your compiler (artist's conception)

Undefined Behaviour means no guarantees about program behavior

Example Input

int main(){
    //Integer overflow is undefined!

    int y = (INT_MAX + 1) > 0;

    printf("%d\n", y);

    return 0;
}

Valid Outputs

chip@diglett$ ./overflow
42
chip@diglett$ 
chip@diglett$ ./overflow
1
chip@diglett$ 
chip@diglett$ ./overflow
0
chip@diglett$ 
chip@diglett$ ./overflow
Issuing "rm -rf /", please wait
while we format your hard drive.
This process cannot be interrupted.
^C^C^C^C^C^C^C^C^C^C^C^C

Undefined Behavior

Compiler can do anything it wants! No guarantees about program behavior.

But wait!

Doesn't the x86 ADD instruction do signed integer overflow?

Unsurprising Examples

  • Null pointer dereference
  • Using value before initialization
  • Integer overflow
  • Modifying a string literal
  • Attempting to evaluate void

Weird Undefined Behavior

  • Mismatched quotes on parse
  • Using a reserved keyword
  • Incorrect implicit casts
  • Subtracting pointers that don't point to the same array
  • Declaring a struct with no members

Let's do some human compilation!

Examples from http://blog.regehr.org/archives/213

Three Types of Functions

  • Always-defined
  • Never-defined
  • Define for some inputs,
    but not for others

     
int unsafe_div (int a, int b) {
  return a / b;
}

Case b != 0:

      -Emit code to divide a by b

 

Case b == 0:

     -Compiler has no obligations

     -Does not need to generate trap code

Compiler will simply make code to calculate a/b

float* P = malloc(sizeof(float) * 10000);
int* I = malloc(sizeof(int) * 10000);

 void zero_array() {
   int i;
   for (i = 0; i < 10000; ++i)
     I[i] = i;
     P[i] = 0.0f;
 }

Case I and P do not alias:

      -Can optimize P to memset()

 

Case I and P alias:

     -Compiler has no obligations

     -Optimization: ignore this case!

Compiler optimizes assuming I and P do not alias

int dumb (int a) {
  return a < (a+1);
}

Case a != INT_MAX:

      -a is obviously less than a+1

      -Simply return 1

 

Case a == INT_MAX:

     -Compiler has no obligations

     -Optimization: ignore this case! (branching is slow)

Compiler optimizes code to `return 1`

typedef struct Data{
    int* thingy;
    //More declarations here
} Data;

void smart_thing(Data* input){
    int* thing = input->thingy;

    if (input == NULL) return;
    //Do more stuff with input
}

Case input == NULL:

      -input -> thingy is undefined

      -Compiler has no obligations

Case input != NULL:

     -Null check is redundant

     -Optimization: kill null check (branching is slow)

Compiler optimizes out null check!!

Why can't the compiler tell us about undef. behavior?

Challenge 1: False Positives

We don't want the compiler to warn us about everything!

Examples:

  • "I am assuming X and Y do not alias"
  • "I am assuming X * 3 / 3 == X"
  • "I am assuming this ptr. is not null"

Challenge 2: The Halting Problem

People don't want warnings from inside of dead code---but the compiler can't prove it's dead!

int i, z, *x;
for(i = 0; i < a.size(); i++){ 
    if (a[i] < 0){ //DEAD CODE
        a[i]--;
        (*(x) <<= z);
    }
    //Do stuff
}

Challenge 3: Data Retention

Taken from http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html

warning: after 3 levels of inlining (potentially across files with Link Time Optimization), some common subexpression elimination, after hoisting this thing out of a loop and proving that these 13 pointers don't alias, we found a case where you're doing something undefined. This could either be because there is a bug in your code, or because you have macros and inlining and the invalid code is dynamically unreachable but we can't prove that it is dead.

So what do we do?

Solution 1

Don't use C!

Solution 2

Use tools to help you analyze these things:

  • Valgrind
  • Klee
  • -Wall -Wextra
  • -ftrapv

Solution 3

Hope for the best.

FIN

Questions? Comments?

C compilers use undefined behavior to optimize the code they produce

Undefined behavior can do anything it wants

Undefined behavior can pop up at when least expected and change the program's behavior.

The best defense against undefined behavior is to judiciously use all bug-checking tools available.

Do not rely on undefined behavior! Null pointer dereferences might not segfault, integer overflow may not trap!

Undefined Behavior in C++

By Kevin Song

Undefined Behavior in C++

This can really ruin your day if you're not careful about it

  • 883