Points-to analysis in almost linear time

Bjarne Steensgaard

Gokulan R

CS15B033

22 April 2020

POPL '96

Steensgaard's Analysis

Inter-Procedural, flow-insensitive analysis
Produces result in almost linear time
Key Contributions
- Type system using which variables can be expressed
- Constraint system using which points-to constraints can be expressed
- Linear time algorithm to find points-to information in almost linear time

Retrospective, circa 1995

Java was announced for the first time by Sun
Microsoft announces Windows 95
Netscape announces development of Javascript
Intel announced Pentium and Pentium Pro Processors
Peak performance: ~60MHz

Source

Points-to Analysis

May-Analysis: Exhaustive list of all the locations a given pointer can point to during the entire execution time of the program
- Dead Code Elimination
- Constant Propagation
- . . .

a = &x;
b = &y;
c = b;
a = *b;

a : {x, y}
b : {y}
c : {y}

Motivation

Most compilers use intra-procedural analysis
- Polynomial time and space
- Works well even for large programs (~100k lines)
Existing inter-procedural analysis
- Cubic time complexity
- Have been successful only on small programs (<10k lines)
Why inter-procedural analysis?
- Whole program optimizations

Related Work

Weihl: Interprocedural data flow analysis in the presence of pointers, procedure variables, and label variables
- flow insensitive, interprocedural
- cubic time complexity, doesn't handle recursion well
Context insensitive, polynomial time complexity
Conext sensitive, exponential time complexity
Andersen's analysis

Statements of Interest

x = y	copy statement
x = &y	address-of statement
x = *y	load statement
x = op(y1 y2 ... yn)	scalar operator
x = allocate(y)	dynamic memory allocation
*x = y	store statement
x = fun(f1 . . . fn)	function call statement
x1 x2 ... xm = p(y1 . . . yn)	bitwise construction of pointer

flow-insensitive analysis : Control structures are irrelevant

Common Cases

x = y

x = &y

x = *y

*x = y

Andersen's Analysis

Each statement in the program can be explained using a set of constraints
All constraints have to be satisfied at the end of the analysis.

x = y

points-to(x) \( \supseteq \) points-to(y)

where x can be of the form: x, *x

y can be of the form: y, *y, &y

Andersen's Analysis

p1 = &a

p2 = &b

p1 = p2

r = &p1

p3 = *r

p2 = &d

a\(\in\)points-to(p1)

b\(\in\)points-to(p2)

p1\(\supseteq\)points-to(p2)

p1\(\in\) points-to(r)

points-to(p3) \(\supseteq\) points-to(*r)

d\(\in\) points-to(p2)

Example from Prof. Charles' CS701 slides

{}

{p1}

{a, b}

{b, d}

{a, b}

{p1}

{a, b, d}

{b, d}

{a, b, d}

O(\( n^3\))

Goal : Linear-time points-to analysis

A graph is used to store program / points-to information.
What should be the complexity of diffferent components for linear time? - Recall Kildall's algorithm
- Size of each node in the graph - O(1)
- Number of edges from each node - O(1)
- Operation on each node - ~O(1)
- Number of iterations over the entire graph - O(1)

Steensgaard's Analysis

Each statement in the program can be explained using a set of constraints
All constraints have to be satisfied at the end of the analysis.

x = y

points-to(x) \( \supseteq \) points-to(y)

points-to(y) \( \supseteq \) points-to(x)

where x can be of the form: x, *x

y can be of the form: y, *y, &y

For the statement x=y, merge the points-to sets of x and y.

Steensgaard's Analysis

p1 = &a

p2 = &b

p1 = p2

r = &p1

p3 = *r

p2 = &d

Steensgaard's Analysis

p1 = &a

p2 = &b

p1 = p2

r = &p1

p3 = *r

p2 = &d

a,b

Steensgaard's Analysis

p1 = &a

p2 = &b

p1 = p2

r = &p1

p3 = *r

p2 = &d

a,b,d

Type system

Not to be confused with data types in programs like int, float, boolean, etc.
Type: A compile-time data structure used to store the points-to information of a node in the graph
Type: (\( \alpha \times \lambda \))
- \( \alpha \): describes data locations to which the variable can point to
- \( \lambda \): describes functions to which the variable can point to

Types - Example

a = &x;
b = &y;
c = b;

\( \bot \)

Variables

Memory Locations

Types

Types - Example

a = &x;
b = &y;
c = b;

\( \bot \)

Variables

Memory Locations

Types

\( \bot \)

Well Typedness of program

Naive approach

Less constrained approach

\( \bot \)

a = 4
x = a
y = x

\( \bot \)

Well Typedness of program

p1 = &a

p2 = &b

p1 = p2

r = &p1

p3 = *r

p2 = &d

a,b,d

a,b

To satisfy \(t1 \trianglelefteq t2\)

if \(ref(t1) = \bot\) no need to join
else join \(t1\) and \(t2\)

Well Typedness of program

Steensgaard's Analysis

Time complexity: O(n)
- Size of each node in the graph - O(1)
- Number of edges from each node - O(1)
- Operation on each node - O(1)
  - ~O(1) using union-find data structure
- Number of iterations over the entire graph - O(1)

Andersen vs Steensgaard

a,b,d

Time	O(n^3)	O(n)
Outgoing edges	O(n^2)	O(1)
Precision	High	Low

Results

Even with a very conservative analysis, a lot of types point to only one variable.

Summary

An inter-procedural flow-insensitive points-to analysis
Computes results in almost linear time
Uses a type system to describe points-to relations
Uses a constraint system to describe operations on types
Imprecise than existing inter-procedural analysis but provides comparable results in less time

Thank You

Every problem in Computer Science can be solved by using another level of indirection.

- David Wheeler