CONTINUOUS DEPLOYMENT WITH SINGULARITY


Large Scale Mission-Critical Service and Job Deployment  


Gregory Chomatas 
@gchomatas

PAAS TEAM

Implement & maintain: 
  • the deploy & build tools
  • the PAAS platform (mesos clusters)
  • load balancer tools
  • logging infrastructure

Boston: Whitney Sorenson, Tom Petr, Tim Finley
  
Dublin: Gregory Chomatas, Kieran Manning
 

An Essential Singularity exp(1/z)

HubSpot Singularity



The way to mesos

Speed wins -> Speed Product Development


 Increase change rate  -> Remove Friction  +  Reduce size, cost, risk of change:

small teams, high trust, low process
freedom and responsibility culture

micro services
libs & cross cutting APIs to simplify coding 
automate deployment by tooling


some facts & numbers


3-4 person teams
several micro-services & jobs per team (full operation)
1 or more services per dev

All QA in MESOS / PROD migration ongoing now

400+ deploys / day -  843 Deployable Items:
219 WEB SERVICES (long running with an API)
246 WORKERS (long running no API)
205 SCHEDULED JOBS (CRON schedule)
173 ON-DEMAND

SOME FACTS & NUMBERS


QA Environment

pre-mesos: 
400 small & medium size servers (c1.xlarge)

post-mesos: 
20 big servers (c3.8xlarge)



Why singularity



provide best possible user experience 
for 100+ HubSpot product developers

deploy entire HubSpot platform onto Singularity


why singularity


early adopter (2012) / immature frameworks
unified API for all deployables

mission critical / strategic tool:
priority and delivery of bug fixes
features and integrations 
the overall roadmap

 have the resources to implement & maintain a highly complex piece of software 
 

DEPLOY WITH HUBSPOT PAAS


 

SINGULARITY COMPONENTS


singularity Scheduler


A DEPLOY-CENTRIC REST API to: 


  • register deployable items
  • execute their deploys
  • view sandbox files
  • get metadata / historical data

SINGULARITY UI


SINGULARITY UI - Deployable item list



SINGULARITY UI - DEPLOYABLE ITEM

SINGULARITY UI - historical TASK

SINGULARITY SCHEDULER


Health Checking at the process and the service endpoint level

Automatic cool-down of repeatedly failing services

Load balancing of service instances  (LB API)

Automatic Rollback of failed deploys

Decommissioning of Slaves & Racks

Emails to service owners on failures

Singularity Executor


Log Rotation

Task Sandbox Cleanup

Graceful Task Killing with configurable timeout

Environment Setup

Task Runner Script

advanced Slave services


Log Watcher: Forward / Stream Logs

S3 uploader: Archive logs with AWS S3 Service

Executor Cleanup: Clean failed executor tasks

Develop with singularity

java 7
guice

dropwizard 
(jersey, jackson, liquibase)

maven

backbone
nodejs
brunch

roadmap / New features

Phased deployment


Auto-scaling / resource usage monitoring and alerting


Enhance job scheduler


Support deploy of Docker containers

 

Open source HubSpot Deployer and Deploy Registry

useful links


https://github.com/HubSpot/Singularity/blob/master/Docs/Singularity_API_Reference.md
https://github.com/HubSpot/Singularity/blob/master/Docs/Singularity_Local_Setup_For_Testing.md
https://mesosphere.io/resources/mesos-case-study-hubspot/




APPENDIX

DEPLOY CONFIGURATION

name: MDS_All_Item_Types_In_One_Config
buildName: MesosDeployIntegrationTestsProject
type: procfile

owners:
- user@hubspot.com

appRoot: /mesos-deploy-test-srv1/v1
loadBalancers: 
  - test

env:
  all:
    JOB_JAR: TestJob.jar

procfile:
  webService:
    cmd: java $JVM_DEFAULT_OPTS -jar TestService.jar server $CONFIG_YAML
    instances: 2
    cpus: 2
    memory: 1024
    numRetriesOnFailure: 5
  scheduledJob:
    cmd: java $JVM_DEFAULT_OPTS -jar $JOB_JAR -testjob
    schedule: '*/3 * * * *'
    numRetriesOnFailure: 5
    healthcheckIntervalSeconds: 40
    healthcheckTimeoutSeconds: 40
  worker:
    cmd: java $JVM_DEFAULT_OPTS -jar TestDaemon.jar
    daemon: true
  onDemand:
    cmd: java $JVM_DEFAULT_OPTS -jar TestsDaemon.jar
    daemon: false

servers:
  qa:
  - mesos:
  prod:
  - mesos

Key Singularity abstractioNs


SINGULARITY REQUEST OBJECT

{
    "id": "TestService",
    "owners": [
        "feature_x_team@mycompany.com",
        "developer@mycompany.com"
    ],
    "daemon": true,
    "instances": 3,
    "rackSensitive": true,
    "loadBalanced": true
} 

KEY SINGULARITY ABSTRACTIONS


singularity deploy object


RESOURCES:  Memory, CPUs, network ports
HEALTH CHECKS: Timeouts and URLs
LOAD BALANCING of web service instances (LB groups, api base path)
EXECUTOR INFORMATION: execution environment, executable artifacts, configuration files, command to execute, executor to use, etc.
{
    "requestId": "MDS_TestService",
    "id": "71_7",
    "customExecutorCmd": ".../singularity-executor",
    "resources": {
        "cpus": 1,
        "memoryMb": 896,
        "numPorts": 3
    },
    "env": {
        "DEPLOY_MEM": "768",
        "JVM_MAX_HEAP": "384m",
    },

"executorData": {
        "cmd": "java -Xmx$JVM_MAX_HEAP -jar .../TestService.jar server $CONFIG_YAML",
"embeddedArtifacts": [ { "name": "rawDeployConfig", "filename": "TestService.yaml", "content": "bmFtZT..." } ], "externalArtifacts": [], "s3Artifacts": [ { "name": "executableSlug", "filename": "TestService.tar.gz", "md5sum": "313be85c5979a1c652ec93e305eb25e9", "filesize": 81055833, "s3Bucket": "hubspot.com", "s3ObjectKey": "build_artifacts/.../TestService.tar.gz" } ],

SINGULARITY API


Manage Deployable items  


ENDPOINT: /requests


register / update / unregister an item 

get info about an item

list items in  active | paused | cool-down state

run / restart / pause / un-pause an item

SINGULARITY API


Deploy the Deployable Items 


ENDPOINT: /deploys


deploy an already registered item 

cancel a pending deploy




SINGULARITY API


Manage Deployable item Instances (TASKS) 

ENDPOINT: /tasks


get the list of all scheduled tasks (not yet active) 

get scheduled tasks for  a specific item

list tasks in active | cleaning | lbcleanup state

info about a specific task

active tasks in a slave

Kill a task

SINGULARITY API


Historical Information about deployable items & their tasks

ENDPOINT: /history


a single task history

 tasks that have run in the past

all previous item updates

search for historical items by item id

all item deploys

a specific item deploy

SINGULARITY API


List & Download files in Active Task Sandbox


ENDPOINT: /sandbox


list all task files 

read file chunks

download a file

SINGULARITY API

Cluster STATE Information 

ENDPOINT: /state

{
  activeTasks: 567,
  activeRequests: 843,
  cooldownRequests: 1,
  scheduledTasks: 142,
  pendingRequests: 0,
  lbCleanupTasks: 1,
  activeSlaves: 21,
  deadSlaves: 0,
  decomissioningSlaves: 0,
  activeRacks: 3,
  deadRacks: 0,
  futureTasks: 142,
  maxTaskLag: 0,
  overProvisionedRequests: 0,
  underProvisionedRequests: 0,
  allRequests: 844
}

SINGULARITY UI - DEPLOYABLE ITEM task

SINGULARITY ui - dashboard



SINGULARITY UI - racks & slaves




Large Scale Mission-Critical Service and Job Deployment with Singularity

By Gregory Chomatas

Large Scale Mission-Critical Service and Job Deployment with Singularity

Use Singularity, a mesos framework, to automate the deployment of microservices, scheduled jobs, workers and on-demand processes in mesos clusters.

  • 2,797