State of the art in benchmarking BigData systems

Frans Ojala: frans.ojala@helsinki.fi

Helsinki University

A standard or point of reference against which things may be compared.

 

A problem designed to evaluate the performance of a computer system.

Benchmark

/ˈbɛn(t)ʃmɑːk/

- Oxford dictionary

Benchmark

Premise 1: we need compute power, storage, network bandwidth etc. and it costs.

        - How can we get the best system for the capital we have?

 

Premise 2: different vendors have different systems that have different properties.

        - How can we compare the systems?

Why should we benchmark?

- CPU speed

- Disk I/O

- Memory I/O

- Network speed, latency

- Database throughput

- Application throughput

- Holistic system performance on load

Benchmark

What should be benchmarked?

Easy! Build an application that mimics the desired load and build a scoreboard!

Benchmark

How should we benchmark?

  1. Introduction and motivation

  2. A brief history of benchmarking

  3. Case example in Cloud benchmarking: BigBench

  4. Case example in ranking service providers: SMICloud

  5. Summary of key concepts

  6. Inudstry standards

Overview

A brief history of benchmarking

HPC

benchmarks

  • HPCC (HPC Challenge) benchmark suite
    • Floating point rate, double precision rate, memory bandwidth and latency, network capacity, random memory updates, discrete Fourier transform. LINPACK.
    • Metrics are MFLOPS, Gb/s etc.
  • SPEC benchmark suite
    • A newer perspective that fixes some problems with HPCC.
    • Measures elapsed time, claims to be fairer overall.
    • ​Large set of programs to measure different aspects of HPC systems.
  • TPC benchmarks
    • Cover many aspects of traditional computing, such as typical business order-entry environment, data integration, decision support and others.

Database

benchmarks

  • TPC-benchmarks (-A, -C, -D)
    • Many solutions for business and e-commerce
    • Mainly geared toward databases and transactions
  • Wisconsin benchmark
    • 'De facto' database benchmark for a long time
    • Structured data
  • XGDBench
    • New benchmark for graph stores
    • Addresses exascale cloud problems
    • MAG: simulates real-world data

Cloud benchmarks

BigBench

  • A result of the first Workshop for Big Data benchmarking.
  • TPCx-BB uses BigBench for its BigData benchmark.
  • 3 V's: Volume, Variety, Velocity;
    • Arguably also 'Value'.
  • End-to-end benchmark system:
    • Fictious retailer as basis.
    • Data model (3V).
    • Data generator.
    • Workload description (in english).
  • Requires more work and industry input.

BigBench

Data Model

Ghazal, Ahmad, et al. "BigBench: towards an industry standard benchmark for big data analytics." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013.

  • Structured:
    • TPC-DS adaption
    • Additionally a table for current competitor prices
    • Relational data
  • Semi-structured:
    • Web-page navigation sequence
    • Repeat customers and guests
    • Key-value data
  • Unstructured:
    • User product reviews

BigBench

Data Generator

Rabl, Tilmann, et al. "A data generator for cloud-scale benchmarking." Performance Evaluation, Measurement and Characterization of Complex Systems. Springer Berlin Heidelberg, 2010. 41-56.

  • Based on PDGF and its extension:
    • Can generate large amount of data for arbitrary schema.
    • Structured data from original PDGF.
    • Semi-structured and unstructured data needs extensions.
  • TextGen:
    • Uses Markov chains to extract key words from input text.
    • Builds a dictionary and uses its contents to build synthetic reviews.

BigBench

Workload queries

  • Workloads constructed from the enterprise point of view:
    • Marketing
    • Merchandising
    • Operations
    • Supply chain
    • New businessa models
  • ​Also from technical point of view:
    • ​Use mutliple data types in queries
    • Declarative (SQL) and procedural (MR)
    • Algorithmic data processing

Ghazal, Ahmad, et al. "BigBench: towards an industry standard benchmark for big data analytics." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013.

Proposed metric:

BigBench

Ghazal, Ahmad, et al. "BigBench: towards an industry standard benchmark for big data analytics."

Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013.

TPCx-BB: http://www.tpc.org/tpcx-bb/default.asp

Ranking service providers

SMICloud

  • Framework proposed by Cloud Service Measurment Index Consortium to compare cloud providers.
  • Offers evaluation based on user QoS requirements.
    • How to measure various SMI attributes?
    • How to rank service providers based on SMI attributes?
  • Some SMI not easily quantified.
  • What SMI to include?
    • More SMI complicates Multi-Criteria Decision Making.
    • Analytical Hierarchical Process to assign weights to SMI.
  • Decision support tool!

SMICloud

Service Measurment Index

http://www.csmic.org

  • Based on ISO standards' business-relevant KPI.
  • Holistic view of the QoS requirements of a cloud customer.
  • Catalogues service providers
  • Monitors service providers
  • Brokers with cloud customers

SMICloud

Key Perfomance Indicators

  • Service response time
  • Sustainability
  • Suitability
  • Accuracy
  • Transparency
  • Interoperability
  • Availability
  • Reilability
  • Stability
  • Cost
  • Adaptability
  • Elasticity
  • Usability
  • Throughput and efficiency
  • Scalability

SMICloud

Garg, Saurabh Kumar, Steve Versteeg, and Rajkumar Buyya. "A framework for ranking of cloud computing services"

Future Generation Computer Systems 29.4 (2013): 1012-1023.

SMICloud: http://www.csmic.org

Summary of key concepts

Data generation

  • Data used in benchmarking must be generated
    • Network bandwidth insufficient
    • Storage systems insufficient
    • Empirical data not tailorable
  • Methods
    • On-the-fly
    • A-priori
    • Hybrid
  • Generator must support wide data dynamics
    • Able to address "4 V's"

Workloads

  • Multiple types of workloads must be supported
    • Premade suites
    • Custom designed
    • Platform independent
  • Should support at least the most common APIs
    • MapReduce, Spark, MongoDB, Cassandra...
    • Enables more workloads
  • Possible to holistically evaluate system

Metrics

  • Systems have multiple dimensions
    • Dimensions vary between applications
    • Must support reporting of various properties
    • No single numer can represent a system
  • Users have different needs
    • Need to be able to rank providers accordingly
    • Needs multidimensional metrics
    • Quality of Service is a important in the Cloud
  • No single number can represent a system holistically

Differences between benchmarks

Hwang, Kai, et al. "Cloud Performance Modeling with Benchmark Evaluation of Elastic Scaling Strategies." Parallel and Distributed Systems, IEEE Transactions on 27.1 (2016): 130-143.

  • Different benchmarks can achieve varying results
    • Implementation details
    • Programming model
    • Programming language
    • Data
    • Workloads

Industry standard

(Does not exist yet)

Industry standards

  • TCPx-HS, BB
    • TPC has had several industry standard benchmarks
  • PARSEC 2.0
    • Version 1.0 was a de facto benchmark, now obsolete
    • Has major overhauls, BigData approach, but still lacking
  • ​CloudHarmony​
    • ​Provides ranking services of different cloud providers
    • Basis on CloudBench, CloudProbe
  • CloudSpectator
    • Basis on UnixBench, Phoronix test suite, others
    • Current top google result for 'Cloud benchmarking'

Sources

Sources

  • BigBench
    • Ghazal, Ahmad, et al. "BigBench: towards an industry standard benchmark for big data analytics." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013.
  • ​SMICloud
    • ​Garg, Saurabh Kumar, Steve Versteeg, and Rajkumar Buyya. "SMICloud: a framework for comparing and ranking cloud services." Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on. IEEE, 2011.
  • ​TPC, see associated documentation
    • http://www.tpc.org
  • Others: ask, and I shall provide

Questions? 

Comments?

More details on benchmarks

HPCC

  • Linpack TPP benchmark measures floating point rate of execution for linear equations.
  • Floating point rate of execution of double precision real matrix-matrix multiplication.
  • Synthetic benchmark program to measure sustainable memory bandwidth and the corresponding computation rate for simple vector kernel.
  • Parallel matrix transpose for measuring network capacity.
  • Measures the rate of integer random updates of memory (GUPS).
  • Measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Transform (DFT).
  • Communication bandwidth and latency.

SPEC CPU

Dixit, Kaivalya M. "Overview of the SPEC Benchmarks." (1993): 489-521.

LINPACK

  • Collection of subroutines written in Fortran to solve systems of linear equations (matrix operations).
  • Measures double precision floating point performance.
  • Widely used and is the standard benchmark in ranking "top500" supercomputers.

LINPACK problems

  • Small, fit in cache

  • Obsolete instruction mix

  • Uncontrolled source code

  • Prone to compiler tricks

  • Short runtimes on modern machines

  • Single-number performance characterization with a single benchmark

  • Difficult to reproduce results (short runtime and low-precision UNIX timer) 

NAS parallel benchmarks

  • Small set of programs designed to help evaluate the performance of parallel supercomputers.
  • Basis in computational fluid dynamics (CFD) applications and consist of five kernels and three pseudo-applications in the original "pencil-and-paper" specification (NPB 1).
  • Has been extended to include new benchmarks for unstructured adaptive mesh, parallel I/O, multi-zone applications, and computational grids.

Courtesy of NASA, NAS: https://www.nas.nasa.gov/publications/gallery.html

More cloud benchmarks

  • AMP benchmarks
  • LinkBench
  • Cloud Suite
  • CloudCmp
  • CloudBench
  • Cloudstone
  • C-SMART
  • ... the list goes on
Made with Slides.com