Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation
Vinod Ganesan
vinodg@cse.iitm.ac.in
SysDL Reading Group
DNN Compilation
Conv2D
Frameworks
Data Flow graphs
DNN Operator Libraries
Hardware
Engineer Intensive!
Custom Operator libraries
CUDA
DSCs
ReLU
DNN Compilation
DNN Operator Libraries
Hardware
ML based program optimizer
Conv2D
Frameworks
Data Flow graphs
ReLU
DNN Compilation flow
- Data layout transformation
- Dead code elimination
- Operator fusion
- Loop tiling
- Loop unrolling
- Loop interchange
$$ \theta^* = \argmax_\theta f(\tau(\theta)), \theta \in D_\theta$$
- \( \theta^* \) - optimized schedule
- \( \tau \) - code template
- \( \theta = (\theta_1, \theta_2,....., \theta_n) \) tunable knobs
- f - fitness function
- \( D_\theta \) - Design Space
Rich Search Space
Effective search algorithm
Fast fitness estimation
State-of-the-art: TVM
Search for better schedules
Sample for improving fitness estimation
TVM - Disadvantages
- Simulated annealing is slow (ineffective search algorithm!)
- Simulated annealing is oblivious to the gradual changes in the cost model, redundant work done during the search process.
- Greedy sampling and annealing is passive as it relies on cost model. This leads to neglecting good solutions that are distributed non-uniformly.
- Greediness leads to overfitting the cost model - making things worse
- Greedy sampling often gives redundant/invalid configurations - wasting precious hardware measurements
TVM is very slow!
Intelligent Search algorithm
$$ s_\theta^* = \argmax_{s_\theta \subset S_\theta}(P(f_{ideal}(\tau) - \max_{\theta \in s_\theta}f(\tau(\theta)) = 0) $$
$$ A^* = \argmin_{A}(\#steps(s_{\theta,t} = A(s_{\theta,t-1})) = s_{\theta^*} $$
Exploration
Exploitation
Actor-critic based RL agent employing PPO
Clustering-based sampling
Bread and butter of RL algorithm
State Space
\( \theta = (\theta_0, \theta_1,...,\theta_n) \)
.....
Action Space
\( \theta = (inc, dec, stay,....,inc) \)
Reward Formulation
RL Agent - policy and value networks
- Policy network takes \( \theta \) and returns a vector of directions
- The value network returns the value of the action.
- Config updater takes the previous config and the vector of directions to generate the next set of configs.
- All n configurations from an episode ( \( S_\theta \) ) are evaluated on the cost model, and are used to generate rewards
Reducing costly hardware measurements
Clustering adapts to the changing landscape of the design space and non-uniformity of distribution
- Cluster configurations ( \( s_\theta \) ) based on k-means clustering by finding optimal value of k
- Maintain the history of previously visited configurations
- Use the centroids alone for hardware measurement. If a centroid is previously visited, ignore it and choose an unseen config
Evaluation - Improving efficacy of search
Reduction in Search steps over TVM
RL agent is able to quickly learn the correlations amongst different knobs
Reuses information from previous iterations, unlike simulated annealing!
Evaluation - Reducing hardware measurements
- Adaptive sampling works well with RL since RL is able to localize the search to meaningful samples (exploration) while maintaining diversity (exploitation)
Putting it all together
Mean reduction of 4.45x in compilation time, and mean improvement of 6% in output performance
Mean reduction of 4.82x in compilation time, and mean improvement of 17% in performance
My key takeaways
- RL + Adaptive sampling can improve the optimization time significantly while also improving the performance
- Most of the speed-up comes from reducing the hardware measurements, as seen from the results. A marginal speed-up also comes from re-using previous configurations, where the RL agent is useful!
- Bottom-line: It is not completely clear if all the claims of RL being very effective for search are coming out in the results since the gains are very disproportional
- DNNs are not representative of current State-of-the-art!
There probably is a God. Many things are easier to explain if there is than if there isn't
- Von Neumann
Vinod Ganesan
SysDL Reading Group
vinodg@cse.iitm.ac.in
RL and Adaptive Sampling for DNN
By Vinod Ganesan
RL and Adaptive Sampling for DNN
- 187