Research Proposal

by Dmitry Duplyakin

 

 

Managing and Optimizing Experiments

on Cloud Computing Systems

PhD Student, University of Colorado

10/21/2016

Committee:

Jed Brown, Robert Ricci, Ken Anderson, Shivakant Mishra, Rick Han

Outline

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

  • Motivation
  • Previous Projects:
    • Rebalancing [2013] in a Multi-Cloud Environment
    • Architecting [2015] a Persistent and Reliable Configuration Management System
    • Introducing Configuration Management [2016] Capabilities into CloudLab Experiments
    • Active Learning [2016] in Performance Analysis
  • Future Work
    • Cooperative Scheduling with Elastic Cloud Experiments
    • Active Learning in Adaptive Mesh Refinement Problems
  • Summary

Different Ends of the Computing Spectrum

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Large-Scale Computing Small-Scale Computing
Cost Highly subsidized Unsubsidized
Resources Dedicated machines Limited allocations
Hardware Redundant, special-purpose Dual-purposed, commodity
Analysis Demonstrate computing
at the largest possible scale
Obtain the most knowledge
out of available cycles

Thesis Statement and Research Objectives

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Cloud Computing offers mechanisms for utilizing on-demand computing resources. Coupled with sophisticated software tools built upon open-source projects, such resources can be efficiently managed and utilized according to user and application requirements.

Thesis Statement:

Research Objectives:

  • design and implement such solutions
  • evaluate solutions and entire computing environments
  • navigate efficient configurations
  • understand applicability

"Flavors" of Computing and Infrastructure

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Technology: Overview

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Rebalancing [2013]

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Environments that consist of compute resources provisioned at multiple clouds may need to be periodically rebalanced: some resources need to be terminated and replaced with different ones in order to best satisfy current user needs.

Automatic rebalancing is a non-trivial process.

Problem Statement:

Rebalancing [2013]

Metrics of interest:

  • Speed of rebalancing
  • Wasted cycles
  • Cost of cloud deployment

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Rebalancing [2013] - Opportunistic Policy

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Rebalancing [2013] - Force Offline

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Rebalancing [2013] - Tradeoffs

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Rebalancing [2013] - Summary

  • Created a multi-cloud environment capable of rebalancing
  • Observed rebalancing under HTC workloads
  • Proposed and evaluated rebalancing policies
    • Examined tradeoffs such as cost and workload overhead

Architecting [2015]

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Configuration management systems must remain operational in the majority of failure scenarios since they are the systems on which system administrators rely in performing recovery actions.

Problem Statement:

Proposal: investigate use of clouds

  • not for additional compute cycles
  • but rather as a source of resources for administrative services

Architecting [2015]

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Developed prototype:

Architecting [2015]

Leveraged Technologies:

  • Configuration Management System (CMS): Chef 11
  • Amazon Web Services: EC2, EBS, CloudFormation
  • Openswan 2.6.32, open source IPSec VPN
  • Distributed Replicated Block Device (DRBD)
  • Firewalls and static routing rules

Configuration Management [2016]

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Configuration Management [2016]

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Testbeds: provide isolated and recreatable environments

Configuration Management [2016]

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Experiments: require building software environments

Configuration Management [2016]

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Experiments: require building software environments... on many nodes

Configuration Management [2016]: Common Workflows

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Configuration Management [2016]

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Research Proposal, Fall 2016

Problem: Explosion of Configurations!

(snapshot- and simple script-based approaches don’t scale)

Dmitry Duplyakin, University of Colorado

Configuration Management [2016]

Research Proposal, Fall 2016

Dmitry Duplyakin, University of Colorado

Configuration Management [2016]: Summary

Active Learning [2016]

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Paper: Active Learning in Performance Analysis
Presented at: IEEE Cluster 2016, Taipei, Taiwan, 2016

Motivation

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

Performance Analysis:

  • Take a set of measurements
  • Build a model
  • Understand behavior
    of a complex system
  • Predict outcomes of future experiments 

Main Challenges:

  • Often too many factors
  • Inability to take equal number of measurements at every configuration
  • Inefficient exploration of input space

Motivation: Example 1

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

Each point represents a run of HPGMG-FE benchmark on a 4-node cluster provisioned on CloudLab testbed

Motivation: Example 2

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

Each point represents a run of HPGMG-FE benchmark on a 4-node cluster provisioned on CloudLab testbed

Approach: Active Learning

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

  • Use Active Learning (AL) -- techniques from Machine Learning
    where "learner" interacts with "data source"
    1. Train a model on a small set of measurements
    2. Let the model suggest a point for the next experiment  
    3. Run the suggested experiment
    4. Retrain the model with the new measurement
    5. Go back to 2 or exit
  • Sometimes called: adaptive experiment design
    and optimal experiment design

Approach: Gaussian Process Regression

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

  • Use Gaussian Process Regression (GPR) -- non-parametric
    non-linear interpolation technique that provides best linear unbiased prediction (under suitable assumptions)
    • Build a model for 
    • For every new       ,
      calculate estimates of  
                         and
  • Sometimes called: kriging
    (in geostatistics) and
    Wiener–Kolmogorov prediction
  • GPR works in many dimensions

Approach: Putting it Together

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

  • Combine AL and GPR into a 2-layer system:

 

 

  • Optimization problem at each layer:

   Upper: AL

   Lower: GPR

   Upper: Choose "best" experiment

   Lower: Choose "best" hyperparameters

Approach: Details

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

   Upper: Choose "best" experiment

   Lower: Choose "best" hyperparameters

Consider strategies:
Variance Reduction (VR):
Cost Efficiency (CE):

Use: Bayesian Model Selection
(Marginal Likelihood Maximization)
with 3 hyperparameters:
noise level, length scale, and amplitude

Implementation

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

  • Developed a prototype in Python which supports:
    • single realizations of AL in "offline" mode*
    • batches of realizations for comparison of Variance Reduction and Cost Efficiency strategies 
  • GPR: used code for Gaussian Processes in scikit-learn (0.18.dev0)

* Note: offline refers to the fact that the prototype queries a database with collected data. Future work: in online mode, run AL alongside the computation

Analyzed Datasets

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

  • Measured runtimes and estimated energy consumption
    for a large set of HPGMG-FE benchmark runs
    on a cluster provisioned on the CloudLab testbed
  • Organized this data into two datasets:
  • 3d visualizations are available here

Active Learning: 10 Iterations

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

Shown points represent a subset of measurements in the Performance dataset; runtimes are log-transformed

Active Learning: 100 Iterations

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

Shown points represent a subset of measurements in the Performance dataset; runtimes are log-transformed

Evaluation: Convergence Analysis

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

Evaluation: Cost Analysis

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

Active Learning [2016]: Summary

Dmitry Duplyakin, University of Colorado

Active Learning in Performance Analysis

09/14/2016

  • Proposed using Active Learning + Gaussian Process Regression
    for efficient regression learning in performance analysis
  • Demonstrated tradeoffs between two Active Learning algorithms, with and without adjustment for experiment cost
  • Developed a prototype capable of processing diverse datasets

Technology: Recap

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Technology: Future Work

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Future Work: Cooperative Scheduling

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Context: two systems with different loads (clusters, clouds, testbeds,...) 

Goal: they "exchange" resources when possible

Proposed interface: exchange preemption vectors

Cooperative Scheduling: Prototype

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

(elastic experiment on Apt)

Cooperative Scheduling: Summary

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

  • Elastic experiments: demonstrate usefulness and necessary components
  • Preemption vectors: demonstrate how can be calculated and tuned
  • Evaluation:
    • microbenchmarks
    • scale up and down under HTC and HPC workloads

Potential Targets:

  • Special Issue on Middleware for Multicloud 2017 -- Papers due: early January
  • ScienceCloud 2017 Conference -- Papers due: early February

Future Work: AL and AMR

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Adaptive Mesh Refinement (AMR):

  • popular technique used in science and engineering
  • increases resolution where accurate solution is most needed
    • coarse mesh in most of the domain
    • finer meshes in sensitive regions
  • designed to improve shock capturing methods
  • reduces computation and storage
  • computational requirements are hard to predict

Source: http://math.boisestate.edu/~calhoun/www_personal/research/amr_software

Proposal: apply AL to AMR problems

Common Approaches to AMR

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Tree-based

Block structured

(patch-based)

Source: http://www.training.prace-ri.eu/uploads/tx_pracetmo/AMRIntroHNDSCi15.pdf

Examples of AMR Software

 

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Sources:

http://math.boisestate.edu/~calhoun/www_personal/research/amr_software/

https://www.youtube.com/watch?v=DKn9iuD7Ihk

https://arxiv.org/pdf/1308.1472v1.pdf

Clawpack and GeoClaw (based on AMRClaw)

p4est and ForestClaw

AL and AMR

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Proposal: apply AL to AMR problems

  • AMR performance is sensitive to physical parameters
  • We can include physical parameters into AL-based modeling
  • AL will allow us to model accuracy of AMR in the performance-aware way
    • automate the process of selecting the most efficient configurations
    • avoid regions with spikes in computational requirements

AL and AMR: Proposed Study

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Potential Targets:

  • HPDC 2017 Conference -- Papers due: January 17
  • ICS 2017 Conference -- Papers due: January 18
  • SC 2017 Conference -- Paper due: early April
x^*_{VR}=argmax_x (\sigma_f(x)), x^*_{CE}=argmax_x (\sigma_f(x)/f(x))
xVR=argmaxx(σf(x)),xCE=argmaxx(σf(x)/f(x))x^*_{VR}=argmax_x (\sigma_f(x)), x^*_{CE}=argmax_x (\sigma_f(x)/f(x))
  • Collaborate with Prof. Donna Calhoun (Boise State University)
  • Obtain copies of ForestClaw performance results
  • Run "offline" AL on these datasets
  • Evaluate AL schemes:
  • Quantify tradeoffs

Proposed Timeline

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

Summary

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

  • Discussed four previous papers
  • Discussed how the last two papers create foundation for future work  
  • Described motivation and scope of future studies
  • Proposed timeline and potential deadlines

Other Contributions

Dmitry Duplyakin, University of Colorado

Research Proposal, Fall 2016

  • CloudLab Tutorial at NSFCloud For Everyone Workshop, Nov 2016 
  • CloudLab Tutorial at GENI Regional Workshop, Mar 2016
  • Chef Tutorial, new chapter in the CloudLab online manual
  • Participate in the "Rethinking Experimental Methods in Computing" seminar (Dagstuhl Seminar 16111)
  • Student Volunteering  at SC'16 and IEEE Cluster'13
  • Participate in Graduate Peer Mentoring Program, 2016-2017 AY
  • Mentor a CS undergraduate in DLA program, 2014-2015 AY 

Thank you!

Questions?

Research Proposal

By Dmitry Duplyakin

Research Proposal

Slides for the first discussion of the research proposal by Dmitry Duplyakin - University of Colorado - Fall 2016

  • 1,059