warehouse-scale computing 

with mesos anD SINGULARITY


PaaS Infrastructure automation & Sustainable Development Velocity

Grigorios Chomatas 
@gchomatas

I work in the PaaS Team


we Implement & maintain: 


the PaaS platform (Mesos/Singularity clusters) 

the deploy & build tools (Orion / fabric)

load balancer tools (Baragon) 

logging infrastructure

Internet Scale Services / SaaS


super scalable & elastic

highly available

resilient

 

easy to develop & deploy several times per DAY
1000s of  micro-services, jobs & workers

Polyglot persistence

microservices

"...the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery" - Martin Fowler


POLYGLOT PERSISTENCE


a variety of different data storage technologies for different kinds of data

THE TWELVE-FACTOR APP


build software-as-a-service (SaaS) apps that:

Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration

Minimize divergence between development and production, enabling continuous deployment for maximum agility

Can scale up without significant changes to tooling, architecture, or development practices




The hubspot way

Speed wins  Speed Product Development


 Increase change rate   Remove Friction  +  Reduce size, cost, risk of change:

small teams, high trust, low process
freedom and responsibility culture

embrace the 12-factor app template
micro services
libs & cross cutting APIs to simplify coding 
automate deployment by tooling


Teams X Microservices X Data Stores  → a ton of servers / VMs

welcome to the cluster management pains 


poor utilisation & elasticity
higher rate of failures
higher complexity
high operational overhead

The hubspot case - pre-mesos


100+ engineers in 3-4 person teams
several micro-services & jobs per team

843 product components / deployable Items

219 WEB SERVICES
246 WORKERS
205 SCHEDULED JOBS
173 ON-DEMAND


~1150 small to medium size AWS servers 

and lots of productivity lost in provisioning & operations

statically partition your cluster is bad


and can cost much...



Mesos to the rescue


a domain agnostic computing resource sharing platform 

a computer cluster / data center operating system

a meta scheduler with reusable primitives for building domain specific cluster schedulers (a.k.a frameworks)

Mesos key features


fine-grained multiple resource sharing
diverse domains
containerized deployment
data locality
simple / efficient / scalable
fault tolerant
highly available
java, c++, python, go language bindings

Grid computing & mesos


Grid computing 
 
Borg (Google)  
 
Omega (Google)

Mesos (for all of us!!!)

Mesos architecture


mesos resource offers


mesos primitives

framework [scheduler, executor, roles]

scheduler [registered, reregistered, resourceOffers, offerRescinded, statusUpdate, frameworkMessage, slaveLost, executorLost, error]

Executor [registered, reregistered, disconnected, launchTask, killTask, frameworkMessage, shutdown, error]

slave resources [cpu, mem, disk, ports] & attributes
--resources="cpus(prod):20; cpus(stage):4; mem(*): 24000; 
  disk(*):128000; ports(*):[30000-32000]; disks(*):{1,2,3,4}
--attributes="disktype:ssd; dc:1; rack:2; netcard:10G; os:ubuntu; jvm:8u45" 

modules [allocator, hook, isolator, auth]
persistent & dynamically allocated resources
resize tasks
optimistic offers


mesos strong ecosystem


77 organisations more that 20 frameworks

Airbnb Apple Atlassian Cisco Coursera eBay Ericsson Foursquare Groupon 
HubSpot Netflix OpenTable PayPal Qubit Time Warner Cable Twitter Uber...


An Essential Singularity exp(1/z)

HubSpot Singularity



PRE & POST-MesoS SOME FACTS & NUMBERS



pre-mesos: 
QA       → 843 components  in ~400  small & medium size servers
PROD → 843  components in  ~750 small & medium size servers

post-mesos: 
QA → 1504 components  in 39  large size mesos slaves
PROD  1303  components in  104 large size mesos slaves

400 deploys per day


HUbspot mesos QA cluster 

(Singularity DASHBOARD)


HUBSPOT MESOS PROD CLUSTER 

(SINGULARITY DASHBOARD)


Why WE BUILT singularity



provide best possible user experience 
for 100+ HubSpot product developers

deploy entire HubSpot platform onto Singularity


why WE BUILT singularity


early adopter (2012) / immature frameworks
unified API for all deployables

mission critical / strategic tool - we wanted to control:
priority and delivery of bug fixes
features and integrations 
the overall roadmap

we had the resources to implement & maintain a highly complex piece of software
 

HUBSPOT PAAS


DEPLOY CONFIGURATION

name: Orion_All_Deployable_Types_In_One_Config
buildName: MesosDeployIntegrationTestsProject
type: procfile

owners:
- user@hubspot.com

appRoot: /testservice/v1
loadBalancers: 
  - test

env:
  all:
    JOB_JAR: TestJob.jar

procfile:
  webService:
    cmd: java $JVM_DEFAULT_OPTS -jar TestService.jar server $CONFIG_YAML
    instances: 2
    cpus: 2
    memory: 1024
    numRetriesOnFailure: 5
  scheduledJob:
    cmd: java $JVM_DEFAULT_OPTS -jar $JOB_JAR -testjob
    schedule: '*/3 * * * *'
    numRetriesOnFailure: 5
    healthcheckIntervalSeconds: 40
    healthcheckTimeoutSeconds: 40
  worker:
    cmd: java $JVM_DEFAULT_OPTS -jar TestDaemon.jar
    daemon: true
  onDemand:
    cmd: java $JVM_DEFAULT_OPTS -jar TestsDaemon.jar
    daemon: false

servers:
  qa:
  - mesos:
  prod:
  - mesos

 

SINGULARITY COMPONENTS


singularity Scheduler


A DEPLOY-CENTRIC REST API to: 


  • register deployable items
  • execute their deploys
  • view sandbox files
  • get metadata / historical data

SINGULARITY SCHEDULER Advanced features


Native Docker Support

Health Checking at the process and the service endpoint level

Automatic cool-down of repeatedly failing services

Load balancing   of service instances  (LB API)

Automatic Rollback
  of failed deploys

Decommissioning of Slaves & Racks

Emails to service owners on failures

Webhooks

Key Singularity abstractioNs


Deployable (SINGULARITY REQUEST object)

{
    "id": "TestService",
    "owners": [
        "feature_x_team@mycompany.com",
        "developer@mycompany.com"
    ],
    "type": service,
    "instances": 3,
    "rackSensitive": true,
    "loadBalanced": true
} 

KEY SINGULARITY ABSTRACTIONS


singularity deploy object


RESOURCES:  Memory, CPUs, network ports
HEALTH CHECKS: Timeouts and URLs
LOAD BALANCING of web service instances (LB groups, api base path)
EXECUTOR INFORMATION: execution environment, executable artifacts, configuration files, command to execute, executor to use, etc.
{
    "requestId": "MDS_TestService",
    "id": "71_7",
    "customExecutorCmd": ".../singularity-executor",
    "resources": {
        "cpus": 1,
        "memoryMb": 896,
        "numPorts": 3
    },
    "env": {
        "DEPLOY_MEM": "768",
        "JVM_MAX_HEAP": "384m",
    },

"executorData": {
        "cmd": "java -Xmx$JVM_MAX_HEAP -jar .../TestService.jar server $CONFIG_YAML",
"embeddedArtifacts": [ { "name": "rawDeployConfig", "filename": "TestService.yaml", "content": "bmFtZT..." } ], "externalArtifacts": [], "s3Artifacts": [ { "name": "executableSlug", "filename": "TestService.tar.gz", "md5sum": "313be85c5979a1c652ec93e305eb25e9", "filesize": 81055833, "s3Bucket": "hubspot.com", "s3ObjectKey": "build_artifacts/.../TestService.tar.gz" } ],

Singularity Executor


Log Rotation

Task Sandbox Cleanup

Graceful Task Killing with configurable timeout

Environment Setup

Task Runner Script

"Embedded" artifacts

advanced Slave services


Log Watcher: Forward / Stream Logs

S3 uploader: Archive logs with AWS S3 Service

Executor Cleanup: Clean failed executor tasks

SINGULARITY UI


SINGULARITY UI - Deployable item list



SINGULARITY UI - DEPLOYABLE ITEM

SINGULARITY UI - historical TASK

A mesos/singularity cluster at your laptop


install boot2docker & docker-compose

% git clone https://github.com/HubSpot/Singularity.git

% cd Singularity

% docker-compose up

% boot2docker ip
192.168.59.103

point you browser to http://192.168.59.103:7099/singularity/


Exampe Singularity request aPI calls


check registered deployables (singularity requests):
http GET http://192.168.59.103:7099/singularity/api/requests
register the "TestWorker" deployable (create a new singularity request):
http POST http://192.168.59.103:7099/singularity/api/requests id=TestWorker owners:='["gchomatas@hubspot.com"]' requestType=worker 
deploy the "TestWorker" deployable
http POST http://192.168.59.103:7099/singularity/api/deploys deploy:='{"requestId":"TestWorker", "id":"1", "command":"while true; do echo \"Spending cycles for nothing\"; sleep 2; done", "resources": {"cpus":0.1, "memoryMb":128, "numPorts":0}}' 

DEPLOY A JOB WITH DEFAULT EXECUTOR


http POST http://192.168.59.103:7099/singularity/api/requests id=TestJob owners:='["gchomatas@hubspot.com"]' requestType=scheduled schedule='* 0/5 * * * ?'

http POST http://192.168.59.103:7099/singularity/api/deploys deploy:='{"requestId":"TestJob", "id":"1", "command":"echo \"Spending cycles for nothing\"", "resources": {"cpus":0.1, "memoryMb":128}}' 

Deploy a service with default executor


http POST http://192.168.59.103:7099/singularity/api/requests id=TestService owners:='["gchomatas@hubspot.com"]' requestType=service


http POST http://192.168.59.103:7099/singularity/api/deploys deploy:='{"requestId":"TestService", "id":"1", "command":"java -Ddw.server.applicationConnectors[0].port=$PORT1 -Ddw.server.adminConnectors[0].port=$PORT0 -jar helloworld-1.0-SNAPSHOT.jar server example.yml", "resources": {"cpus":0.1, "memoryMb":128, "numPorts":2}, "uris":["https://github.com/micktwomey/docker-sample-dropwizard-service/releases/download/1.0/helloworld-1.0-SNAPSHOT.jar", "https://github.com/micktwomey/docker-sample-dropwizard-service/releases/download/1.0/example.yml"], "healthcheckUri": "/healthcheck"}' 

DEPLOY A Docker SERVICE WITH Docker EXECUTOR


http POST http://192.168.59.103:7099/singularity/api/requests id=TestDockerService owners:='["gchomatas@hubspot.com"]' requestType=service

http POST http://192.168.59.103:7099/singularity/api/deploys deploy:='{"requestId":"TestDockerService", "id":"1", "resources": {"cpus":0.1, "memoryMb":128, "numPorts":2}, "healthcheckUri": "/healthcheck", "containerInfo":{"type": "DOCKER", "docker":{"network": "BRIDGE", "image": "micktwomey/sample-dropwizard-service:1.0", "portMappings":[{"containerPortType": "LITERAL", "containerPort": 8081, "hostPortType": "FROM_OFFER", "hostPort": 0, "protocol": "tcp"}, {"containerPortType": "LITERAL", "containerPort": 8080, "hostPortType": "FROM_OFFER", "hostPort": 1, "protocol": "tcp"}]}}}' 


http POST http://192.168.59.103:7099/singularity/api/deploys deploy:='{"requestId":"TestDockerService", "id":"2", "resources": {"cpus":0.1, "memoryMb":128, "numPorts":2}, "healthcheckUri": "/healthcheck", "containerInfo":{"type": "DOCKER", "docker":{"network": "BRIDGE", "image": "micktwomey/docker-sample-web-service", "portMappings":[{"containerPortType": "LITERAL", "containerPort": 8081, "hostPortType": "FROM_OFFER", "hostPort": 0, "protocol": "tcp"}, {"containerPortType": "LITERAL", "containerPort": 8080, "hostPortType": "FROM_OFFER", "hostPort": 1, "protocol": "tcp"}]}}}'

DEPLOY A Load Balanced SERVICE with Default executor


http POST http://192.168.59.103:7099/singularity/api/requests id=TestLoadBalancedService owners:='["gchomatas@hubspot.com"]' requestType=service loadBalanced=true


http POST http://192.168.59.103:7099/singularity/api/deploys deploy:='{"requestId":"TestLoadBalancedService", "id":"1", "command":"java -Ddw.server.applicationConnectors[0].port=$PORT0 -Ddw.server.adminConnectors[0].port=$PORT1 -jar helloworld-1.0-SNAPSHOT.jar server example.yml", "resources": {"cpus":0.1, "memoryMb":128, "numPorts":2}, "uris":["https://github.com/micktwomey/docker-sample-dropwizard-service/releases/download/1.0/helloworld-1.0-SNAPSHOT.jar", "https://github.com/micktwomey/docker-sample-dropwizard-service/releases/download/1.0/example.yml"], "healthcheckUri": "/", "serviceBasePath":"/", "loadBalancerGroups":["test"]}'

Develop with singularity

java 8
guice

dropwizard 
(jersey, jackson, liquibase)

maven

backbone
nodejs
brunch

useful links


API Reference

Examples of using the API
Try it out in your laptop!

MesosCon 2014

MesoCon 2014 Continuous Deployment with Singularity

Twitter University: Mesos, HubSpot, and the Singularity
Mesosphere Case Studies: HubSpot Experiences with Apache Mesos




APPENDIX

SINGULARITY API


Manage Deployable items  


ENDPOINT: /requests


register / update / unregister an item 

get info about an item

list items in  active | paused | cool-down state

run / restart / pause / un-pause an item

SINGULARITY API


Deploy the Deployable Items 


ENDPOINT: /deploys


deploy an already registered item 

cancel a pending deploy




SINGULARITY API


Manage Deployable item Instances (TASKS) 

ENDPOINT: /tasks


get the list of all scheduled tasks (not yet active) 

get scheduled tasks for  a specific item

list tasks in active | cleaning | lbcleanup state

info about a specific task

active tasks in a slave

Kill a task

SINGULARITY API


Historical Information about deployable items & their tasks

ENDPOINT: /history


a single task history

 tasks that have run in the past

all previous item updates

search for historical items by item id

all item deploys

a specific item deploy

SINGULARITY API


List & Download files in Active Task Sandbox


ENDPOINT: /sandbox


list all task files 

read file chunks

download a file

SINGULARITY API

Cluster STATE Information 

ENDPOINT: /state

{
  activeTasks: 567,
  activeRequests: 843,
  cooldownRequests: 1,
  scheduledTasks: 142,
  pendingRequests: 0,
  lbCleanupTasks: 1,
  activeSlaves: 21,
  deadSlaves: 0,
  decomissioningSlaves: 0,
  activeRacks: 3,
  deadRacks: 0,
  futureTasks: 142,
  maxTaskLag: 0,
  overProvisionedRequests: 0,
  underProvisionedRequests: 0,
  allRequests: 844
}

SINGULARITY UI - DEPLOYABLE ITEM task

SINGULARITY ui - dashboard



SINGULARITY UI - racks & slaves




Warehouse-scale computing with Mesos and Singularity: PaaS Infrastructure Automation and sustainable development velocity

By Gregory Chomatas

Warehouse-scale computing with Mesos and Singularity: PaaS Infrastructure Automation and sustainable development velocity

Use mesos and singularity to automate infrastructure, remove service deployment frictions and sustain the development velocity of your product

  • 3,499