Evaluation of Elastic Experiments
Demo of Initial Results
Dmitry Duplyakin, 01/13/2017
HTC, 50 jobs run to completion in over ran less than 2h
(Experiment directory: 20161205-213258)
Halfway through the experiment, some nodes finish their jobs
and remain idle until the end of experiment; LIFO chooses such idle nodes for preemption (therefore WC goes to 0)
(Experiment directory: 20161205-213258)
Observations:
(Experiment directory: 20161205-213258)
Runtime:
Node count:
Simulating 8h of execution of a subset of PEREGRINE jobs
Visualizing preemption vectors with heatmaps: the darker, the more valuable)
(Experiment directory: yass/preserved/hpc-20161219-131501)
FIFO:
(Experiment directory: yass/preserved/hpc-20161219-131501)
(Experiment directory: yass/preserved/hpc-20161219-131501)
(Experiment directory: yass/preserved/hpc-20161219-131501)
Conclusion: need more data (longer simulation)
(Experiment directory: yass/preserved/hpc-20161219-131501)
Back to HTC
To run a longer experiment: combine 10 shuffled copies
Simulated time: ~68h on 20 nodes
(Experiment directory: yass/preserved/htc-20170106-0958061)
PAP is the same as LIFO
(Experiment directory: yass/preserved/htc-20170106-0958061)
(Experiment directory: yass/preserved/htc-20170106-0958061)
Simulating full PEREGRINE workload
(Experiment directory: yass/preserved/peregrine-20170109-094539)
(Experiment directory: yass/preserved/peregrine-20170109-094539)
Observation: GP does not play a significant role
(Experiment directory: yass/preserved/peregrine-20170109-094539)
Observation: PAP's performance is the same as LIFO's
Simulating PEREGRINE workload on a larger cluster:
(Experiment directory: yass/preserved/peregrine-20170112-114539)
(Experiment directory: yass/preserved/peregrine-20170112-114539)
(Experiment directory: yass/preserved/peregrine-20170112-114539)
Observation: PAP performs slightly worse than LIFO
Example of a scenario where PAP makes worse decision than LIFO
HTC:
PEREGRINE (HPC):
Maybe preemption policies should consider wall clock times requested by users?
The accuracy of such estimates is extremely low:
for PEREGRINE, for over 50% of the jobs the requested wall clock exceeds the actual wall clock by the factor of 19.2.
In the "HPC System Lifetime Story..." paper on the analysis of NERSC HPC systems,
the authors mention similar level of accuracy:
The wall clock accuracy is calculated as real/estimated wall clock time....
For Carver... In 2014, the median is under 0.1 and the last quartile it is under 0.2.
PEREGRINE workload:
PEREGRINE workload: distribution of 7275 jobs (number and total node-seconds) by job duration
PEREGRINE workload:
Breakdown of jobs by queue:
batch 6047
short 922
debug 143
bigmem 76
long 44
phi 42
large 1
Name: queue, dtype: int64
Proposal: treat 76 bigmem jobs (~1% of total number) as high priority jobs
Goal: improve PAP policy so it minimizes WC across all jobs and also tries to
preserve high-priority jobs (i.e. reduce WC for high-priority jobs when possible)
Implementation:
PAP policy:
preemption_vector = scale(job_runtime * job_nodecount),
where scale() converts vector values to the [0,1] range
New PAP+ policy:
preemption_vector = scale(job_runtime * job_nodecount * job_priority)
Proposed Experiment:
Comparing PAP and PAP+ based on WC for default- and high-priority jobs
Selected metric: cumulative WC, summed across all samples recorded during the simulation of entire PEREGRINE workload
(Experiment directory: yass/preserved/peregrine-20170126-105500)
Judging by the cumulative WC at the end of experiment:
(Experiment directory: yass/preserved/peregrine-20170126-105500)
Results for the rest of the Grace Period values look similar:
(more graphs below)
(Experiment directory: yass/preserved/peregrine-20170126-105500)
(more graphs below)
(Experiment directory: yass/preserved/peregrine-20170126-105500)
(Experiment directory: yass/preserved/peregrine-20170126-105500)
Preemption Vectors visualized
for PAP and PAP+:
(Experiment directory: yass/preserved/peregrine-20170127-104259)
Prioritizing jobs that represent a particular application:
app_name = gaussian (job count = 341, ~4.7% of total)
high_job_priority = 10.0
(Experiment directory: yass/preserved/peregrine-20170127-091920)
Grace Period = 60 Max for PAP for default-priority jobs: 3.625e+06 Max for PAP+ for default-priority jobs: 3.8831e+06 (relative difference: 7.120) Max for PAP for high-priority jobs: 1.1951e+05 Max for PAP+ for high-priority jobs: 3267.4 (relative difference: -97.266)
(Experiment directory: yass/preserved/peregrine-20170127-091920)
Grace Period = 120 Max for PAP for default-priority jobs: 3.6377e+06 Max for PAP+ for default-priority jobs: 3.8961e+06 (relative difference: 7.103) Max for PAP for high-priority jobs: 1.2028e+05 Max for PAP+ for high-priority jobs: 3468.4 (relative difference: -97.116)
Prioritizing jobs that represent a particular application:
app_name = gaussian (job count = 341, ~4.7% of total)
high_job_priority = 10.0
(Experiment directory: yass/preserved/peregrine-20170127-091920)
Grace Period = 1200 Max for PAP for default-priority jobs: 3.8308e+06 Max for PAP+ for default-priority jobs: 4.0929e+06 (relative difference: 6.842) Max for PAP for high-priority jobs: 1.3275e+05 Max for PAP+ for high-priority jobs: 6679.6 (relative difference: -94.968)
Prioritizing jobs that represent a particular application:
app_name = gaussian (job count = 341, ~4.7% of total)
high_job_priority = 10.0
(Experiment directory: yass/preserved/peregrine-20170127-091920)
Grace Period = 1800 Max for PAP for default-priority jobs: 3.9184e+06 Max for PAP+ for default-priority jobs: 4.1819e+06 (relative difference: 6.724) Max for PAP for high-priority jobs: 1.3867e+05 Max for PAP+ for high-priority jobs: 8182.7 (relative difference: -94.099)
Prioritizing jobs that represent a particular application:
app_name = gaussian (job count = 341, ~4.7% of total)
high_job_priority = 10.0
What about larger Grace Periods? Trying: 1h to 6h
What about larger Grace Periods? Trying: 1h to 6h
What about larger Grace Periods? Trying: 30m to 6h with 30m increments
(Experiment directory: yass/preserved/peregrine-20170206-134326)
What about larger Grace Periods?
(Experiment directory: yass/preserved/peregrine-20170206-134326)
What about larger Grace Periods?
(Experiment directory: yass/preserved/peregrine-20170206-134326)
Summary: