by David Thomas — September 2016
Compilers usually only give us a per-file choice between fast (-O2 / -O3) and small (-Os) code generation
But at different times we may need to express our code in ways which emphasise different properties:
So in these instances we may have to do some of the compiler's work ourselves
Let's say we want to avoid division but we still want to be as fast as possible and our divisor is constant
We can hunt around and find something like Jim Blinn's fast divide by 255 formula:
But what if that's not exactly what we need?
Can we find others similar to it?
#define DIV_255(x) ((x + 1 + ((x + 1) >> 8)) >> 8)
Small sequences like these can be discovered with a superoptimiser
Superoptimisers are not as clever as the name might suggest: typically they perform an exhaustive search through a virtual instruction set
They generate tiny programs by constructing every possible permutation of instructions and then run these programs against trial values until they find one which works
"A Hacker's Assistant"
by Henry Warren
This is the one we'll look at
Targets a "generic RISC" instruction set
Stochastic optimiser
Random search
x86-64 only
Does full verification
http://stoke.stanford.edu/
Superoptimiser for LLVM IR
Uses SMT solver
Can cache results using Redis
Non-official Google project
https://github.com/google/souper
/* artificial.frag.c */
#include "aha.h"
int userfun(int x)
{
if (x == 0) return 1;
else if (x == 1) return 2;
else return 0;
}
Note:
No branches
No state
No side-effects
$ make EXAMPLE=artificial aha
gcc -c -O3 -Wall -Wextra -Wno-unused-variable -Wno-unused-parameter -MMD -I. -DINC=\"artificial.frag.c\" -DOFILE=\"artificial.out\" -o aha.o aha.c
gcc -c -O3 -Wall -Wextra -Wno-unused-variable -Wno-unused-parameter -MMD -I. -DINC=\"artificial.frag.c\" -DOFILE=\"artificial.out\" -o simulator.o simulator.c
gcc -O3 -Wall -Wextra -Wno-unused-variable -Wno-unused-parameter -MMD -I. -DINC=\"artificial.frag.c\" -DOFILE=\"artificial.out\" -o aha aha.o simulator.o
$ ./aha 3
Searching for programs with 3 operations.
Found 0 solutions.
Counters = 372751, 382255, 952561, total = 1707567
Process time = 0.029 secs
$ ./aha 4
Searching for programs with 4 operations.
Found a 4-operation program:
add r1,rx,-2
bic r2,r1,rx
shr r3,r2,31
shl r4,r3,rx
Expr: ((((x + -2) & ~x) >>u 31) << x)
(... omitted ...)
Found 4 solutions.
#define TRIAL {1, 0, -1, \
MAXNEG, MAXPOS, MAXNEG + 1, MAXPOS - 1, \
0x01234567, 0x89ABCDEF, -2, 2, -3, 3, -64, 64, -5, -31415, \
0x0000FFFF, 0xFFFF0000, \
0x000000FF, 0x0000FF00, 0x00FF0000, 0xFF000000, \
0x0000000F, 0x000000F0, 0x00000F00, 0x0000F000, \
0x000F0000, 0x00F00000, 0x0F000000, 0xF0000000}
from z3 import *
x = BitVec('x', 32)
y = BitVec('y', 32)
output = BitVec('output', 32)
s = Solver()
s.add(x^y==output)
s.add(((y & x)*0xFFFFFFFE) + (y + x)!=output)
print s.check()