# Nvidia Compiler

• Modelling Irregular Architecture

• ​FP 16, 32, 64 and vector registers

• Improving Runtime Performance

# Terminologies

• Virtual Register
• Mapping Vreg to
• Physical Register
• Spill
• Cost Vector
• Cost Matrix
• FP 16, 32, 64
• Vector
• Interference graph

# 1) INITIALIZE VIRTUAL REGISTERS

• Data Structure

• Array

• initialize Cost Vector

• How does Constraints affects Cost vector?

# 2) Populate Cost Matrix

• For each edge in graph
• uses Cost vectors of Source and Destination
• What is Global Cost Matrix ?
• Why to use ?
• Data Structure
• 2D array for storing Cost matrix for each edge
• GCM for fast Retrieval.

# 3) Reduce Graph

• What is reduce graph ?

• reduce1, reduce2, reduceN

• Data Structure

• Singly Linked List for each Degree

• Why to go for Separate Lists ?

• Populate virtual registers in appropriate Degree List

• While No DegreeList in Empty

• Perform reduce1()

• Perform reduce2()

• Perform reduceN()

• Propagate Solution

# Reduce 1

• For Adjacent Y of X
• Y.CostVector += X.CostVector

# Reduce N

• Called when degree > 2
• For i = 0 to |cx|
• For node y in adjacent node(x)
• cy(i) += min( Cxy(i,:) + cy)

# Naive

• For each Node X in degree0 List

• X.registerAssignment = minIndex(X.CostVector)
• For each Adjacent Y of X
• ​Y.CostVector[X.registerAssignment] = INFINITY

# Minima Based

• For each Node X in degree0 List

• tempCostVector = X.CostVector
• For each Adjacent Y of X
• minY= minIndex(Y.CostVector)
• From Cost Matrix of X and Y
• Add minY 'th Column to tempCostVector
• X.registerAssignment = minIndex(tempCostVecor)
• For each Adjacent Y of X
• ​Y.CostVector[X.registerAssignment] = INFINITY

# THANK YOU

BHUSHAN SONAWANE

SIDDHARTH KUMAR

Internal guide: Prof. M. V. Kulkarni

External guide: Shekhar Divekar

Understanding Algorithm And NVIDIA Compiler Infrastructure

Modelling Vector Instructions

Modelling FP16 and 64 bit registers

Implementation of Basic Register Allocator for sm50

Perf Tuning

What Next

#### Nearly Optimal Register Allocation using PBQP

By Bhushan Sonawane

# Nearly Optimal Register Allocation using PBQP

Nvidia internship project presentation

• 478