Research Proposal
by Dmitry Duplyakin
Managing and Optimizing Experiments
on Cloud Computing Systems
PhD Student, University of Colorado
10/21/2016
Committee:
Jed Brown, Robert Ricci, Ken Anderson, Shivakant Mishra, Rick Han
Outline
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
- Motivation
-
Previous Projects:
- Rebalancing [2013] in a Multi-Cloud Environment
- Architecting [2015] a Persistent and Reliable Configuration Management System
- Introducing Configuration Management [2016] Capabilities into CloudLab Experiments
- Active Learning [2016] in Performance Analysis
-
Future Work
- Cooperative Scheduling with Elastic Cloud Experiments
- Active Learning in Adaptive Mesh Refinement Problems
- Summary
Different Ends of the Computing Spectrum
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Large-Scale Computing | Small-Scale Computing | |
---|---|---|
Cost | Highly subsidized | Unsubsidized |
Resources | Dedicated machines | Limited allocations |
Hardware | Redundant, special-purpose | Dual-purposed, commodity |
Analysis | Demonstrate computing at the largest possible scale |
Obtain the most knowledge out of available cycles |
Thesis Statement and Research Objectives
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Cloud Computing offers mechanisms for utilizing on-demand computing resources. Coupled with sophisticated software tools built upon open-source projects, such resources can be efficiently managed and utilized according to user and application requirements.
Thesis Statement:
Research Objectives:
- design and implement such solutions
- evaluate solutions and entire computing environments
- navigate efficient configurations
- understand applicability
"Flavors" of Computing and Infrastructure
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Technology: Overview
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Rebalancing [2013]
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Paper: | Rebalancing in a Multi-Cloud Environment |
Presented at: |
4th Workshop on Scientific Cloud Computing (ScienceCloud) 2013
Co-located with ACM HPDC 2015, New York, NY, USA, 2013 |
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Environments that consist of compute resources provisioned at multiple clouds may need to be periodically rebalanced: some resources need to be terminated and replaced with different ones in order to best satisfy current user needs.
Automatic rebalancing is a non-trivial process.
Problem Statement:
Rebalancing [2013]
Metrics of interest:
- Speed of rebalancing
- Wasted cycles
- Cost of cloud deployment
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Rebalancing [2013] - Opportunistic Policy
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Rebalancing [2013] - Force Offline
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Rebalancing [2013] - Tradeoffs
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Rebalancing [2013] - Summary
- Created a multi-cloud environment capable of rebalancing
- Observed rebalancing under HTC workloads
- Proposed and evaluated rebalancing policies
- Examined tradeoffs such as cost and workload overhead
Architecting [2015]
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Paper: | Architecting a Persistent and Reliable Configuration Management System |
Presented at: |
6th Workshop on Scientific Cloud Computing (ScienceCloud) 2015
Co-located with ACM HPDC 2015, Portland, OR, USA, 2015 |
Poster: |
Highly Available Cloud-Based Cluster Management
at 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2015 |
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Configuration management systems must remain operational in the majority of failure scenarios since they are the systems on which system administrators rely in performing recovery actions.
Problem Statement:
Proposal: investigate use of clouds
- not for additional compute cycles
- but rather as a source of resources for administrative services
Architecting [2015]
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Developed prototype:
Architecting [2015]
Leveraged Technologies:
- Configuration Management System (CMS): Chef 11
- Amazon Web Services: EC2, EBS, CloudFormation
- Openswan 2.6.32, open source IPSec VPN
- Distributed Replicated Block Device (DRBD)
- Firewalls and static routing rules
Configuration Management [2016]
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Paper: | Introducing Configuration Management Capabilities into CloudLab Experiments |
Presented at: |
The International Workshop on Computer and Network Experimental Research Using Testbeds (CNERT)
Co-located with IEEE INFOCOM, San Francisco, CO, USA, 2016 Received the Best Paper Award |
Configuration Management [2016]
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Testbeds: provide isolated and recreatable environments
Configuration Management [2016]
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Experiments: require building software environments
Configuration Management [2016]
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Experiments: require building software environments... on many nodes
Configuration Management [2016]: Common Workflows
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Configuration Management [2016]
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Research Proposal, Fall 2016
Problem: Explosion of Configurations!
(snapshot- and simple script-based approaches don’t scale)
Dmitry Duplyakin, University of Colorado
Configuration Management [2016]
Research Proposal, Fall 2016
Dmitry Duplyakin, University of Colorado
Configuration Management [2016]: Summary
- On CloudLab, built a profile that turns an experiment into a Chef cluster
- Enabled easy integration of public and private infrastructure code
- Chef Supermarket and emulab/chef-repo on GitHub
- Developed recipes, cookbooks, and roles for building computing clusters
- Developed tutorial material:
Active Learning [2016]
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Paper: | Active Learning in Performance Analysis |
Presented at: | IEEE Cluster 2016, Taipei, Taiwan, 2016 |
Motivation
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Performance Analysis:
- Take a set of measurements
- Build a model
- Understand behavior
of a complex system - Predict outcomes of future experiments
Main Challenges:
- Often too many factors
- Inability to take equal number of measurements at every configuration
- Inefficient exploration of input space
Motivation: Example 1
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Each point represents a run of HPGMG-FE benchmark on a 4-node cluster provisioned on CloudLab testbed
Motivation: Example 2
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Each point represents a run of HPGMG-FE benchmark on a 4-node cluster provisioned on CloudLab testbed
Approach: Active Learning
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Use Active Learning (AL) -- techniques from Machine Learning
where "learner" interacts with "data source"- Train a model on a small set of measurements
- Let the model suggest a point for the next experiment
- Run the suggested experiment
- Retrain the model with the new measurement
- Go back to 2 or exit
- Sometimes called: adaptive experiment design
and optimal experiment design
Approach: Gaussian Process Regression
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Use Gaussian Process Regression (GPR) -- non-parametric
non-linear interpolation technique that provides best linear unbiased prediction (under suitable assumptions)- Build a model for
- For every new ,
calculate estimates of
and
- Sometimes called: kriging
(in geostatistics) and
Wiener–Kolmogorov prediction - GPR works in many dimensions
Approach: Putting it Together
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Combine AL and GPR into a 2-layer system:
- Optimization problem at each layer:
Upper: AL
Lower: GPR
Upper: Choose "best" experiment
Lower: Choose "best" hyperparameters
Approach: Details
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Upper: Choose "best" experiment
Lower: Choose "best" hyperparameters
Consider strategies:
Variance Reduction (VR):
Cost Efficiency (CE):
Use: Bayesian Model Selection
(Marginal Likelihood Maximization)
with 3 hyperparameters:
noise level, length scale, and amplitude
Implementation
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Developed a prototype in Python which supports:
- single realizations of AL in "offline" mode*
- batches of realizations for comparison of Variance Reduction and Cost Efficiency strategies
- GPR: used code for Gaussian Processes in scikit-learn (0.18.dev0)
* Note: offline refers to the fact that the prototype queries a database with collected data. Future work: in online mode, run AL alongside the computation
Analyzed Datasets
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Measured runtimes and estimated energy consumption
for a large set of HPGMG-FE benchmark runs
on a cluster provisioned on the CloudLab testbed - Organized this data into two datasets:
- 3d visualizations are available here
Active Learning: 10 Iterations
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Shown points represent a subset of measurements in the Performance dataset; runtimes are log-transformed
Active Learning: 100 Iterations
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Shown points represent a subset of measurements in the Performance dataset; runtimes are log-transformed
Evaluation: Convergence Analysis
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Evaluation: Cost Analysis
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
Active Learning [2016]: Summary
Dmitry Duplyakin, University of Colorado
Active Learning in Performance Analysis
09/14/2016
- Proposed using Active Learning + Gaussian Process Regression
for efficient regression learning in performance analysis - Demonstrated tradeoffs between two Active Learning algorithms, with and without adjustment for experiment cost
- Developed a prototype capable of processing diverse datasets
Technology: Recap
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Technology: Future Work
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Future Work: Cooperative Scheduling
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Context: two systems with different loads (clusters, clouds, testbeds,...)
Goal: they "exchange" resources when possible
Proposed interface: exchange preemption vectors
Cooperative Scheduling: Prototype
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
(elastic experiment on Apt)
Cooperative Scheduling: Summary
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
- Elastic experiments: demonstrate usefulness and necessary components
- Preemption vectors: demonstrate how can be calculated and tuned
- Evaluation:
- microbenchmarks
- scale up and down under HTC and HPC workloads
Potential Targets:
- Special Issue on Middleware for Multicloud 2017 -- Papers due: early January
- ScienceCloud 2017 Conference -- Papers due: early February
Future Work: AL and AMR
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Adaptive Mesh Refinement (AMR):
- popular technique used in science and engineering
-
increases resolution where accurate solution is most needed
- coarse mesh in most of the domain
- finer meshes in sensitive regions
- designed to improve shock capturing methods
- reduces computation and storage
- computational requirements are hard to predict
Source: http://math.boisestate.edu/~calhoun/www_personal/research/amr_software
Proposal: apply AL to AMR problems
Common Approaches to AMR
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Tree-based
Block structured
(patch-based)
Source: http://www.training.prace-ri.eu/uploads/tx_pracetmo/AMRIntroHNDSCi15.pdf
Examples of AMR Software
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Sources:
http://math.boisestate.edu/~calhoun/www_personal/research/amr_software/
https://www.youtube.com/watch?v=DKn9iuD7Ihk
https://arxiv.org/pdf/1308.1472v1.pdf
Clawpack and GeoClaw (based on AMRClaw)
p4est and ForestClaw
AL and AMR
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Proposal: apply AL to AMR problems
- AMR performance is sensitive to physical parameters
- We can include physical parameters into AL-based modeling
- AL will allow us to model accuracy of AMR in the performance-aware way
- automate the process of selecting the most efficient configurations
- avoid regions with spikes in computational requirements
AL and AMR: Proposed Study
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Potential Targets:
- HPDC 2017 Conference -- Papers due: January 17
- ICS 2017 Conference -- Papers due: January 18
- SC 2017 Conference -- Paper due: early April
- Collaborate with Prof. Donna Calhoun (Boise State University)
- Obtain copies of ForestClaw performance results
- Run "offline" AL on these datasets
- Evaluate AL schemes:
- Quantify tradeoffs
Proposed Timeline
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
Summary
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
- Discussed four previous papers
- Discussed how the last two papers create foundation for future work
- Described motivation and scope of future studies
- Proposed timeline and potential deadlines
Other Contributions
Dmitry Duplyakin, University of Colorado
Research Proposal, Fall 2016
- CloudLab Tutorial at NSFCloud For Everyone Workshop, Nov 2016
- CloudLab Tutorial at GENI Regional Workshop, Mar 2016
- Chef Tutorial, new chapter in the CloudLab online manual
- Participate in the "Rethinking Experimental Methods in Computing" seminar (Dagstuhl Seminar 16111)
- Student Volunteering at SC'16 and IEEE Cluster'13
- Participate in Graduate Peer Mentoring Program, 2016-2017 AY
- Mentor a CS undergraduate in DLA program, 2014-2015 AY
Thank you!
Questions?
Research Proposal
By Dmitry Duplyakin
Research Proposal
Slides for the first discussion of the research proposal by Dmitry Duplyakin - University of Colorado - Fall 2016
- 1,059