Characterizing Performance
and Power Efficiency
on CloudLab
Dmitry Duplyakin
University of Colorado at Boulder
dmitry.duplyakin@colorado.edu
Supercomputing 2015, 11/18/2015
About CloudLab
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Outline
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Performance and Power Analysis: Questions?
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
How to provide consistency, transparency, repeatability?
What are the right building blocks?
Platform-wide, experiment-wide, node-wide?
Useful topologies and recommended user practices?
start with a provided profile or extend an existing profile?
Performance Analysis: Challenges
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
CloudLab: 3 different platforms
Ivy Bridge, Haswell, ARMv8
Power Analysis: Challenges
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Extremely platform-specific
Collecting is prone to failures, no debugging
Raw data: missing data, noise, unknown granularity
Need mechanisms for validation
From power to energy: need appropriate numerical integration
Need different experiment-wide and platform-wide analysis tools
Performance and Power in the Context of CloudLab
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Key Proposals
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Benefits of Using Chef
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Configuration Management with Chef: Architecture
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Chef Terminology
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Chef: Closer Look
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Demo:
Chef: Push Jobs
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
name “get_power"
description "Role applied to nodes that need to download power data"
override_attributes(
"push_jobs" => {
"whitelist" => {
“get_power" => "cd /tmp ; git clone https://github.com/dmdu/power-client.git ;\
/bin/bash -x power-client/power-client.sh -s clemson -l 12h"
}
}
)
run_list [ "push-jobs" ]
Submit role:
# knife role from file get_power.rb
Assign to a node:
# knife node run_list add head "role[get_power]"
Run the job:
# knife job start get_power head
Started. Job ID: 3f0ae42b88ea60365f7d07c64e30ff54
Running (1/1 in progress) ...
Running (0/1 in progress) ...
Big Picture with Chef
Dmitry Duplyakin, University of Colorado
chef-client -z -o <name of the cookbook>
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Evolution of Chef on CloudLab
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Power Analysis
Dmitry Duplyakin, University of Colorado
Site, Harware | CPU | Power, Frequency |
---|---|---|
Wisconsin, Cisco UCS C220 M4 |
Two Intel E5-2630 v3 8-core CPUs at 2.40 GHz (Haswell w/ EM64T) |
TDP: 85 W Turbo: 3.2 GHz |
Clemson, Dell PowerEdge C8220 |
Two Intel E5-2660 v2 10-core CPUs at 2.20 GHz (Ivy Bridge) |
TDP: 95 W Turbo: 3 GHz |
Utah, HP ProLiant m400 |
One ARMv8 64-bit (Atlas/A57) 8-core CPU at 2.4 GHz (APM X-GENE) |
TDP ? No frequency scaling |
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Sources of Power Data
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
IOUT_hex = ipmi-raw --no-probing --driver-type=SSIF \
--driver-address=0x10 --driver-device=/dev/i2c-0 \
0 6 0x52 0x05 0x40 0x02 0x8C
VIN_hex = ipmi-raw --no-probing --driver-type=SSIF \
--driver-address=0x10 --driver-device=/dev/i2c-0 \
0 6 0x52 0x05 0x40 0x02 0x88
IOUT = int(IOUT_hex,16) * 0.01239 - 25.3717
VIN = int(VIN_hex,16) * 0.005208
POWER = VIN*IOUT
power-client.sh -s clemson -l 12h
Power Analysis on ARM - Workload 1
Dmitry Duplyakin, University of Colorado
Workload 1 - Gradual Load
Benchmark: HPGMG-FE
Scaling: idle to 8 cores incrementally
Blue: on-node power
Green: CM power
Observations:
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Power Analysis on ARM - Workload 2
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Workload 2 - Abrupt Load
Benchmark: HPGMG-FE
Scaling: idle to 8 cores
Blue: on-node power
Green: CM power
Observations:
Sampling On-Node Power Draw at Different Rates
Dmitry Duplyakin, University of Colorado
Workload 1
Workload 2
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Comparing Different Energy Estimates
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Workload 1
Workload 2
Interactive Analysis of Power Data
Dmitry Duplyakin, University of Colorado
powervis
https://github.com/emulab/shiny-server
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Power: Summary
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Single Node Power Draw | Min, W | Mean, W | Max, W | sigma^2 |
---|---|---|---|---|
Wisconsin, Idle | 104.0 | 107.1 | 106.0 | 73.2 |
Wisconsin, 32 Threads BLIS | 240.0 | 344.4 | 384.0 | 1089.24 |
Clemson,Idle | 52.0 | 72.8 | 106.0 | 107.1 |
Clemson, 20 Threads BLIS | 254.0 | 257.5 | 262.0 | 2.5 |
Utah, Idle | 38 | 38.5 | 39.0 | 0.25 |
Utah, 8 Threads BLIS | 64 | 88.6 | 97 | 84.15 |
Performance Analysis
Dmitry Duplyakin, University of Colorado
Performance of CloudLab Resources
09/30/2015
Site, Harware | CPU | Clock Rate |
---|---|---|
Wisconsin, Cisco UCS C220 M4 |
Two Intel E5-2630 v3 8-core CPUs at 2.40 GHz (Haswell w/ EM64T) |
Normal: 2.4 GHz Turbo: 3.2 GHz AVX Normal: 2.1 GHz AVX Turbo: 3.2 GHz |
Clemson, Dell PowerEdge C8220 |
Two Intel E5-2660 v2 10-core CPUs at 2.20 GHz (Ivy Bridge) |
Normal: 2.2 GHz Turbo: 3.0 GHz |
Utah, HP ProLiant m400 |
One ARMv8 64-bit (Atlas/A57) 8-core CPU at 2.4 GHz (APM X-GENE) |
2.4 GHz No frequency scaling |
BLIS DGEMM: Single Core
Dmitry Duplyakin, University of Colorado
Performance of CloudLab Resources
09/30/2015
41.70 GF
16.23 GF
3.17 GF
Single-core theoretical peak in DP:
Total: 51.2 GF
Single-core theoretical peak in DP:
Total: 24.0 GF
Single-core theoretical peak in DP:
Total: 4.8 GF
BLIS DGEMM: CPU and Node
Dmitry Duplyakin, University of Colorado
Performance of CloudLab Resources
09/30/2015
Site, Harware | Theoretical Peak | BLIS DGEMM Performance |
BLIS DGEMM Energy Efficiency |
---|---|---|---|
Wisconsin Cisco UCS C220 M4 2 Intel E5-2630 v3 8-core Haswell CPUs at 2.40 GHz |
8 cores 2.1 GHz (AVX, Normal) 2 FMAs per cycle 2 flops in FMA 4 doubles in vector units Total CPU: 268.8 GF Total node (2 CPUs): 537.6 GF |
32 threads: 466.5 GF (87% of peak) | 466.5 GF / 344.4 W 1.34 GF/W |
Clemson Dell PowerEdge C8220 2 Intel E5-2660 v2 10-core Ivy Bridge CPUs at 2.20 GHz |
10 cores 2.2 GHz (Normal) 1 FMAs per cycle 2 flops in FMA 4 doubles in vector units Total CPU: 176.0 GF Total node(2 CPUs): 352.0 |
20 threads: 313.4 GF (89% of peak) | 313.4 GF / 275.5 W 1.14 GF/W |
Utah HP ProLiant m400 1 ARMv8 64-bit (Atlas/A57) 8-core APM X-GENE CPU at 2.4 GHz |
8 cores 2.4 GHz (Normal) 2 cycles per FMA 2 flops in FMA 2 doubles in vector units Total CPU/node: 38.4 GF |
8 threads: 22.6 GF (58% of peak) | 22.6 GF / 88.6 W 0.26 GF/W |
HPGMG-FE
Dmitry Duplyakin, University of Colorado
Performance of CloudLab Resources
09/30/2015
Site, Harware | Theoretical Peak | HPGMG-FE | HPGMG-FE Energy Efficiency |
---|---|---|---|
Wisconsin Cisco UCS C220 M4 2 Intel E5-2630 v3 8-core Haswell CPUs at 2.40 GHz |
8 cores 2.1 GHz (AVX, Normal) 2 FMAs per cycle 2 flops in FMA 4 doubles in vector units Total CPU: 268.8 GF Total node (2 CPUs): 537.6 GF |
32 threads: 93.57 GF (17.4% of peak, 20.1% of DGEMM) |
93.57 GF / 302.2 W 0.31 GF/W |
Clemson Dell PowerEdge C8220 2 Intel E5-2660 v2 10-core Ivy Bridge CPUs at 2.20 GHz |
10 cores 2.2 GHz (Normal) 1 FMAs per cycle 2 flops in FMA 4 doubles in vector units Total CPU: 176.0 GF Total node(2 CPUs): 352.0 |
Estimated at: 20 threads: 73 GF (20.1% of peak, 23.3% of DGEMM) |
Estimated at: 73 GF / 217 W 0.34 GF/W |
Utah HP ProLiant m400 1 ARMv8 64-bit (Atlas/A57) 8-core APM X-GENE CPU at 2.4 GHz |
8 cores 2.4 GHz (Normal) 2 cycles per FMA 2 flops in FMA 2 doubles in vector units Total CPU/node: 38.4 GF |
8 threads: 10.1 GF (26% of peak, 44.7% of DGEMM) |
10.1 GF / 70 W 0.14 GF/W |
Measuring Performance of BLIS DGEMM on ARMv8
Dmitry Duplyakin, University of Colorado
Performance of CloudLab Resources
09/30/2015
Running HPGMG-FE at Different Sites
Dmitry Duplyakin, University of Colorado
Performance of CloudLab Resources
09/30/2015
Performance: Summary
Dmitry Duplyakin, University of Colorado
Performance of CloudLab Resources
09/30/2015
Topology: Deployed Chef Client on ARM
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Topology: Desired Configuration for Benchmarking
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Summary
Dmitry Duplyakin, University of Colorado
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015
Dmitry Duplyakin, University of Colorado
Thank you!
Questions?
dmitry.duplyakin@colorado.edu
Characterizing Performance and Power Efficiency on CloudLab
11/18/2015