Nearly Optimal Register Allocation using PBQP
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
New register Allocator
for
Nvidia Compiler
-
Modelling Irregular Architecture
-
FP 16, 32, 64 and vector registers
-
-
Improving Runtime Performance
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Terminologies
- Virtual Register
-
Mapping Vreg to
- Physical Register
- Spill
- Cost Vector
- Cost Matrix
- FP 16, 32, 64
- Vector
- Interference graph
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
1) INITIALIZE VIRTUAL REGISTERS
-
Data Structure
-
Array
-
-
initialize Cost Vector
-
How does Constraints affects Cost vector?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
2) Populate Cost Matrix
- For each edge in graph
- uses Cost vectors of Source and Destination
- What is Global Cost Matrix ?
- Why to use ?
- Data Structure
- 2D array for storing Cost matrix for each edge
- GCM for fast Retrieval.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
3) Reduce Graph
-
What is reduce graph ?
-
reduce1, reduce2, reduceN
-
-
Data Structure
-
Singly Linked List for each Degree
-
Why to go for Separate Lists ?
-
-
Populate virtual registers in appropriate Degree List
-
Add Vector Constraints
-
While No DegreeList in Empty
-
Perform reduce1()
-
Perform reduce2()
-
Perform reduceN()
-
-
Propagate Solution
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Reduce 1
- For Adjacent Y of X
- Y.CostVector += X.CostVector
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Reduce 2
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185531/reduce2.png)
Reduce N
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
- Called when degree > 2
-
For i = 0 to |cx|
-
For node y in adjacent node(x)
- cy(i) += min( Cxy(i,:) + cy)
-
For node y in adjacent node(x)
Propagate Solution
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Naive
-
For each Node X in degree0 List
- X.registerAssignment = minIndex(X.CostVector)
-
For each Adjacent Y of X
- Y.CostVector[X.registerAssignment] = INFINITY
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Minima Based
-
For each Node X in degree0 List
- tempCostVector = X.CostVector
- For each Adjacent Y of X
- minY= minIndex(Y.CostVector)
- From Cost Matrix of X and Y
- Add minY 'th Column to tempCostVector
- X.registerAssignment = minIndex(tempCostVecor)
-
For each Adjacent Y of X
- Y.CostVector[X.registerAssignment] = INFINITY
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Results
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Highest Register used
Compile Time
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Memory Usage
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
THANK YOU
BHUSHAN SONAWANE
SIDDHARTH KUMAR
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Internal guide: Prof. M. V. Kulkarni
External guide: Shekhar Divekar
Understanding Algorithm And NVIDIA Compiler Infrastructure
Modelling Vector Instructions
Modelling FP16 and 64 bit registers
Implementation of Basic Register Allocator for sm50
Perf Tuning
What Next
![](https://s3.amazonaws.com/media-p.slid.es/uploads/bhushansonawane/images/1185505/nvidia-logo.jpg)
Nearly Optimal Register Allocation using PBQP
By Bhushan Sonawane
Nearly Optimal Register Allocation using PBQP
Nvidia internship project presentation
- 1,008