Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation
Vinod Ganesan
vinodg@cse.iitm.ac.in
SysDL Reading Group
DNN Compilation
Conv2D
Frameworks
Data Flow graphs
DNN Operator Libraries
Hardware
Engineer Intensive!
Custom Operator libraries
CUDA
DSCs
ReLU
DNN Compilation
DNN Operator Libraries
Hardware
ML based program optimizer
Conv2D
Frameworks
Data Flow graphs
ReLU
DNN Compilation flow
$$ \theta^* = \argmax_\theta f(\tau(\theta)), \theta \in D_\theta$$
Rich Search Space
Effective search algorithm
Fast fitness estimation
State-of-the-art: TVM
Search for better schedules
Sample for improving fitness estimation
TVM - Disadvantages
TVM is very slow!
Intelligent Search algorithm
$$ s_\theta^* = \argmax_{s_\theta \subset S_\theta}(P(f_{ideal}(\tau) - \max_{\theta \in s_\theta}f(\tau(\theta)) = 0) $$
$$ A^* = \argmin_{A}(\#steps(s_{\theta,t} = A(s_{\theta,t-1})) = s_{\theta^*} $$
Exploration
Exploitation
Actor-critic based RL agent employing PPO
Clustering-based sampling
Bread and butter of RL algorithm
State Space
\( \theta = (\theta_0, \theta_1,...,\theta_n) \)
.....
Action Space
\( \theta = (inc, dec, stay,....,inc) \)
Reward Formulation
RL Agent - policy and value networks
Reducing costly hardware measurements
Clustering adapts to the changing landscape of the design space and non-uniformity of distribution
Evaluation - Improving efficacy of search
Reduction in Search steps over TVM
RL agent is able to quickly learn the correlations amongst different knobs
Reuses information from previous iterations, unlike simulated annealing!
Evaluation - Reducing hardware measurements
Putting it all together
Mean reduction of 4.45x in compilation time, and mean improvement of 6% in output performance
Mean reduction of 4.82x in compilation time, and mean improvement of 17% in performance
My key takeaways
There probably is a God. Many things are easier to explain if there is than if there isn't
- Von Neumann
Vinod Ganesan
SysDL Reading Group
vinodg@cse.iitm.ac.in