The Grid

YARN & Mesos

Avishai Ish-Shalom (@nukemberg)

Fewbytes

What's a Grid

  • A collection of computer nodes
  • Shared workload
  • General purpose

(Short) History of Grids

  • Appeared mid 90's
  • Commonly used in scientific computations
  • "Distributed Supercomputer"
  • BOINC
  • Beowulf

Grid architecture

  • Scheduler assigns tasks to nodes
  • Assignment based on attributes (architecture) and load
  • Unassigned tasks wait in queue

Grid responsibilities

  • Schedule and distribute tasks to nodes
  • Load balance nodes
  • Retry failed tasks
  • Monitor nodes
  • Collect stdout, stderr
  • Maintain job history

HPC Grids

  • Offline batch jobs
  • No API
  • Simple, generic task placement
  • non preemtive
  • Slot based resource assignment
  • No code distribution mechanism
  • No isolation/containment

And then....

Google Borg

  • In production circa 2003
  • Online apps  & Offline batch tasks
  • Granular resource control
  • Dense
  • API, sophisticated scheduling logic
  • Application specific logic
  • Preemptive

And everyone followed

  • Facebook - Tupperware
  • Twitter - Mesos
  • Yahoo - YARN

It aint easy

Input, output

  • Binaries, data
  • Copy to local
  • Logs

Elastic workload

  • Add/release resources dynamically
  • Resume dead tasks
  • E.g. MapReduce, stateless web apps

Rigid workload

  • Can't change number of tasks
  • Task termination halts job
  • Partitioned data/state
  • E.g. MPI, Sharded database

Task lifespan

  • Short tasks easier to balance
  • Preemption (!!!!)
  • Task creation overhead

Automation

  • Deployment*
  • Process management
  • Supervision
  • Server management

Density

  • Shared resources
  • Complementary workloads
  • Priorities

Abstraction

Why do you care about

  • IP addresses
  • Servers
  • Disks
  • Racks

YARN

One grid to rule them all

One grid to find them

Motivation

  • MRv1 didn't scale
  • Bad resource utilization
  • Multitenancy
  • Not only MR

History

  • Hadoop On Demand successor
  • Development started 2011
  • Released in Hadoop 2.2

Architecture

Architeture (summary)

  • Monolithic scheduler
  • App specific callbacks (app master)
  • Single (pluggable) executor

Mesos

One grid to bring them all

and in the darkness bind them

History

  • Berkeley reasearch project (2008)
  • Lots of input from the Google Borg guys
  • Early adoption by Twitter and AirBnB (2010)
  • Apache project since 2013

Architecture (cont)

Architecture (summary)

  • Two stage scheduler
  • Frameworks, resource offers
  • Multiple executor

Fight!

YARN

  • Data processing workloads
  • Many data apps already support it
  • Quirky Docker support
  • Generic apps (Apache Slider)

Mesos

  • Generic workloads
  • Hadoop (MRv1)
  • Docker support
  • Spark, Storm
  • Plenty of frameworks
  • But some apps need extra coding to work
  • No file distribution mechanism*

Workload

YARN

  • Hadoop sidekick
  • Partial docs
  • Apache Slider

Mesos

  • Independent product
  • Mesosphere DCOS
  • Good docs
  • Aurora, Marathon, Chronos
  • Project Myriad*

Ecosystem

YARN

  • Quirky API
  • Not well documented
  • Monolithic

Mesos

  • Good API
  • Built to be extended

Extending

YARN

  • You already have it ;-)
  • Almost all big data apps support it out of the box
  • Works pretty well for Hadoop

Mesos

  • Generic workload
  • Easy to extend
  • Production workhorse

Bottom line

Right choice if you want a grid for everything

Right choice if you want a grid for data processing

To be continued...

  • APIs
  • Containers
  • Behavior
  • Demos
  • Alternatives (?)
Made with Slides.com