Nearly Optimal Register Allocation using PBQP

New register Allocator
for
Nvidia Compiler
-
Modelling Irregular Architecture
-
FP 16, 32, 64 and vector registers
-
-
Improving Runtime Performance

Terminologies
- Virtual Register
-
Mapping Vreg to
- Physical Register
- Spill
- Cost Vector
- Cost Matrix
- FP 16, 32, 64
- Vector
- Interference graph

1) INITIALIZE VIRTUAL REGISTERS
-
Data Structure
-
Array
-
-
initialize Cost Vector
-
How does Constraints affects Cost vector?

2) Populate Cost Matrix
- For each edge in graph
- uses Cost vectors of Source and Destination
- What is Global Cost Matrix ?
- Why to use ?
- Data Structure
- 2D array for storing Cost matrix for each edge
- GCM for fast Retrieval.

3) Reduce Graph
-
What is reduce graph ?
-
reduce1, reduce2, reduceN
-
-
Data Structure
-
Singly Linked List for each Degree
-
Why to go for Separate Lists ?
-
-
Populate virtual registers in appropriate Degree List
-
Add Vector Constraints
-
While No DegreeList in Empty
-
Perform reduce1()
-
Perform reduce2()
-
Perform reduceN()
-
-
Propagate Solution

Reduce 1
- For Adjacent Y of X
- Y.CostVector += X.CostVector

Reduce 2


Reduce N

- Called when degree > 2
-
For i = 0 to |cx|
-
For node y in adjacent node(x)
- cy(i) += min( Cxy(i,:) + cy)
-
For node y in adjacent node(x)
Propagate Solution

Naive
-
For each Node X in degree0 List
- X.registerAssignment = minIndex(X.CostVector)
-
For each Adjacent Y of X
- Y.CostVector[X.registerAssignment] = INFINITY

Minima Based
-
For each Node X in degree0 List
- tempCostVector = X.CostVector
- For each Adjacent Y of X
- minY= minIndex(Y.CostVector)
- From Cost Matrix of X and Y
- Add minY 'th Column to tempCostVector
- X.registerAssignment = minIndex(tempCostVecor)
-
For each Adjacent Y of X
- Y.CostVector[X.registerAssignment] = INFINITY

Results

Highest Register used
Compile Time

Memory Usage

THANK YOU
BHUSHAN SONAWANE
SIDDHARTH KUMAR

Internal guide: Prof. M. V. Kulkarni
External guide: Shekhar Divekar
Understanding Algorithm And NVIDIA Compiler Infrastructure
Modelling Vector Instructions
Modelling FP16 and 64 bit registers
Implementation of Basic Register Allocator for sm50
Perf Tuning
What Next

Nearly Optimal Register Allocation using PBQP
By Bhushan Sonawane
Nearly Optimal Register Allocation using PBQP
Nvidia internship project presentation
- 1,133