Active Learning in Performance Analysis
Dmitry Duplyakin
Jed Brown
Robert Ricci
09/14/2016
dmitry.duplyakin@colorado.edu
Outline
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Motivation
- Approach
- Implementation
- Datasets and Visualizations
- Evaluation
- Summary and Future Work
Motivation
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Performance Analysis:
- Take a set of measurements
- Build a model
- Understand behavior
of a complex system - Predict outcomes of future experiments
Main Challenges:
- Often too many factors
- Inability to take equal number of measurements at every configuration
- Inefficient exploration of input space
Motivation: Example 1
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Each point represents a run of HPGMG-FE benchmark on a 4-node cluster provisioned on CloudLab testbed
Motivation: Example 2
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Each point represents a run of HPGMG-FE benchmark on a 4-node cluster provisioned on CloudLab testbed
Approach: Active Learning
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Use Active Learning (AL) -- techniques from Machine Learning
where "learner" interacts with "data source"- Train a model on a small set of measurements
- Let the model suggest a point for the next experiment
- Run the suggested experiment
- Retrain the model with the new measurement
- Go back to 2 or exit
- Sometimes called: adaptive experiment design
and optimal experiment design
Approach: Gaussian Process Regression
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Use Gaussian Process Regression (GPR) -- non-parametric
non-linear interpolation technique that provides best linear unbiased prediction (under suitable assumptions)- Build a model for
- For every new ,
calculate estimates of
and
- Sometimes called: kriging
(in geostatistics) and
Wiener–Kolmogorov prediction - GPR works in many dimensions
Approach: Putting it Together
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Combine AL and GPR into a 2-layer system:
- Optimization problem at each layer:
Upper: AL
Lower: GPR
Upper: Choose "best" experiment
Lower: Choose "best" hyperparameters
Approach: Details
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Upper: Choose "best" experiment
Lower: Choose "best" hyperparameters
Consider strategies:
Variance Reduction (VR):
Cost Efficiency (CE):
Use: Bayesian Model Selection
(Marginal Likelihood Maximization)
with 3 hyperparameters:
noise level, length scale, and amplitude
Implementation
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Developed a prototype in Python which supports:
- single realizations of AL in "offline" mode*
- batches of realizations for comparison of Variance Reduction and Cost Efficiency strategies
- GPR: used code for Gaussian Processes in scikit-learn (0.18.dev0)
* Note: offline refers to the fact that the prototype queries a database with collected data. Future work: in online mode, run AL alongside the computation
Analyzed Datasets
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Measured runtimes and estimated energy consumption
for a large set of HPGMG-FE benchmark runs
on a cluster provisioned on the CloudLab testbed - Organized this data into two datasets:
- 3d visualizations are available here
Active Learning: 10 Iterations
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Shown points represent a subset of measurements in the Performance dataset; runtimes are log-transformed
Active Learning: 100 Iterations
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Shown points represent a subset of measurements in the Performance dataset; runtimes are log-transformed
Evaluation: Convergence Analysis
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Evaluation: Cost Analysis
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Summary and Future Work
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Summary:
- Proposed using Active Learning + Gaussian Process Regression
for efficient regression learning in performance analysis - Demonstrated tradeoffs between two Active Learning algorithms, with and without adjustment for experiment cost
Future Work:
- Investigate computational requirements
- Leverage continuous optimization techniques
- Run Active Learning in the online mode
Dmitry Duplyakin, University of Colorado
Thank you!
Questions?
dmitry.duplyakin@colorado.edu
Active Learning in Performance Analysis
09/14/2016
IEEE Cluster 2016 - Active Learning in Performance Analysis
By Dmitry Duplyakin
IEEE Cluster 2016 - Active Learning in Performance Analysis
Slide deck for presenting a paper titled "Active Learning in Performance Analysis" at IEEE Cluster 2016.
- 1,027