Survey of Superoptimization

Jiangyi Liu, Xingyu Xie

Approaches

# Superoptimization

Enumerative search

Stochastic search

Synthesis-based

  •   Component-based synthesis problem
  •   Counter-Example Guided Inductive Synthesis
# Enumerative search

Enumerative search

  • node: a vector of program states
  • edge: a instruction with a defined cost
  • path: an execution of a program
  • source: inputs of testcases
  • destination: outputs of testcases

Basic idea: enumerate all paths of cost lower than the one to optimize.

Many techniques for speedup:

  • merge equivalence class
  • refine (but not rebuild) the graph with a new testcase
  • bidirectional search

(ASPLOS'13) Phitchaya Mangpo Phothilimthana et al. Scaling up  Superoptimization

# Stochastic search

Stochastic search

program space

(blue area is the equivalent programs to the target)

cost function

Basic idea: (Metropolis-Hastings algorithm) mutate the current program, and accepted with a probability according to the cost.

(ASPLOS'13) Eric Schkufza et al. Stochastic Superoptimization

# Equivalence checker

Equivalence checker

Data-driven non-deterministic approach:

  1. Use template to generate alignment predicates.
  2. Construct a product CFG, by aggregating the paired states satisfying the alignment predicate to one node.
  3. Generate invariants for each node from data.
  4. Use SMT-solver to verify proof obligations.

(OOPSLA'13) Rahul Sharm et al. Data-Driven Equivalence Checking

(PLDI'19) Berkeley Churchill et al. Semantic Program Alignment for Equivalence Checking

# Testcases

Testcases

  • User provided testcases are required.
  • If there is no control-flow, we could manually assign cost for each kind of instruction. But testcases are necessary for evaluate performance when branches or loops exist.
  • To better handle loops, "counter-example" found by bounded verifier is added into the set of testcases (above figure).

(ASPLOS'17) Berkeley Churchill et al. Sound Loop Superoptimization for Google Native Client

# Reinforcement learning

Reinforcement learning

Mutations:

  • operand moves
  • rotate moves
  • opcode width moves
  • delete moves
  • add nop moves
  • replace nop moves
  • memory+swap

Use RL to learn the distribution to perform each mutation, which is uniform before.

(ICLR'17) Rudy Bunel et al. Learning to superoptimize programs

(DL4C workshop at ICLR'22) Alex Shypula et al. Learning to superoptimize real-world programs

# SYNTHESIS

Synthesis-based approaches

Souper: A Synthesizing Superoptimizer

- Component-based Synthesis; CEGIS

- Target: loop-free subset of LLVM IR

Raimondas Sasnauskas et al. A Synthesizing Superoptimizer

From Software Analysis (2021), instructed by Yingfei Xiong, Peking University

What is component-based synthesis?

 

The user provides a library of components:

\( \{\langle\vec I_i, O_i, \phi_i(\vec I_i, O_i)\rangle \mid i = 1 \dots N \}\)

- Here \(\phi_i\) is the constraint on a component

- to allow multiple use of a component, put more than one copies in    the library

 

Besides, a specification of the desired program should be given:

\( \langle \vec I, O, \phi_{spec}(\vec I, O)\rangle \)

 

GOAL: Find a straight-line program that only uses components given in the library

- A mental model: components are connected by input/output relations

- f_impl ​should satisfy the following formula

 

 

 

i.e. for every combination of input value & temporary variable values, if the spec for components holds, then f_impl meets \(\phi_{spec}\)

Encode connections in SMT formulas.

 

- Divide I/O vars into sets:

  - \(\mathbf{P} = \bigcup_{1 \le i \le N} \vec I_i\),   \(\mathbf{R} = \{O_1, \dots, O_N\}\)

- Location = Line number OR input variable

  - Location 0 ~ Location (M-1): input for every component

  - Location M, ...: assignment to temp var / final output

  - \(M = \sum_{1 \le i \le N} \mathsf{arity}(\vec I_i)\)

- Consistency: locations are distinct

\[\psi_{cons} := \bigwedge_{x,y\in\mathbf{R},x\ne y} l_x \ne l_y\]

- Acyclic: all connections don't form a loop

\[\psi_{acyc} := \bigwedge_{1 \le i \le N} \left(\bigwedge_{x \in \vec I_i} l_x < l_{O_i}\right)\]

Encode connections in SMT formulas (cont.)

 

- wfp = cons + acyc + bounding locations

\[\psi_{\mathrm{wfp}}(L):=\bigwedge_{x \in \mathbf{P}}\left(0 \leq l_{x} \leq M-1\right) \wedge \bigwedge_{x \in \mathbf{R}}\left(|\vec{I}| \leq l_{x} \leq M-1\right) \wedge\\ \psi_{cons}(\mathbf{L}) \wedge \psi_{acyc}(\mathbf{L})\]

   where \(L\) stands for the set of locations

- \(\phi_{lib}\): library specs

\[\phi_{lib} := \bigwedge_{1 \le i \le N} \phi_i(\vec I_i, O_i)\]

- \(\psi_{conn}\): connection

\[\psi_{conn} := \bigwedge_{x,y \in \mathbf{P} \cup \mathbf{R} \cup \vec I \cup \{O\}} (l_x = l_y \to x = y)\]

Encode connections in SMT formulas (cont.)

 

 

 

Synthesis Constraint

 

 

Counterexample Guided Inductive Synthesis

 

- solves \(\exists L \forall \vec I: \phi(L, \vec I)\)

 

- \(\mathcal{S}\): finite set containing valuation of \vec I

- Loop:

   - find \(L\) that satisfies \(\phi(L, \vec I_i)\), where \(I_i \in \mathcal{S}\)

   - check if \(L\) satisfies the \(\forall\)-clause;

     - if not, add the counterexample to \(\mathcal{S}\)

     - else, L corresponds to the synthesized program

Souper is a superoptimizer based on CEGIS and component-based synthesis.

 

 

 

 

 

 

 

 

`infer` marks the entry of superoptimizer.

CEGIS process is wrapped in another loop.

Thus the cost of yielded program is bounded, and the 1st result is always the optimized version.

# Expansions

Expansions

  • floating-point: hard to optimize and reason because of rounding error
    • STOKE uses ULP (a uniform approximate measure of rounding error) to describe correctness (difference) in searching, and furthermore use Metropolis-Hastings algorithm to find maximum error to validte.
  • conditional correctness: equivalent under restricted inputs
    • make the input domain smaller to enable better optimization
  • cooperative superoptimizer: share the currently found best rewrite

(PLDI'14) Eric Schkufza et al. Stochastic Optimization of Floating-Point Programs with Tunable Precision

(OOPSLA'15) Rahul Sharma et al. Conditionally Correct Superoptimization

(ASPLOS'13) Phitchaya Mangpo Phothilimthana et al. Scaling up  Superoptimization

  • Comparison of approaches
  • Possible project plans
  • Reseach possibilities

Summary

Comparison of approaches

# Comparison
Tool Approach Language Size Loop?
STOKE Stochastic search x86-64 ~100 inst,
<=200 inst
Y
Souper Synthesis​ Souper IR (from LLVM IR) 1KB N
GreenThumb Enumerative search ARMv7-A,
GreenArrays
~10 inst, <=30 inst N

Possible project plans

# Plans

Reproduce STOKE on ARM64

  • A sandbox or instrumented assembler: the one of STOKE (OOPSLA'13) could be regarded as a reference, but more research is needed
  • A rewrite generator: counter-example guided simulated annealing (ASPLOS'13) (ASPLOS'17)
  • An equivalence checker: migrate the equivalence checker in STOKE onto ARM64 (OOPSLA'13) (PLDI'19)

Research possibilities

# Research possibilities
  • improvements inspired by real programs from Huawei: since equivalence checker is incomplete
  • specific to ARM64 (RISC ISA): improve current approaches
  • conditions of input: maybe developers of Huawei could give some useful & complicated restrictions of domain knowledge
  • data-driven invariant (precondition) generation: our group is familiar with this technique
Made with Slides.com