Ryan Walls

@ryanwalls

Director of Software Engineering

3DSIM

...in Production at 3DSIM

A case study

Agenda

  • The Decision
  • The Tools
  • Cluster!
    • Provisioning
    • Networking
    • Logging
    • Monitoring/Alerting
    • Rolling updates
    • Service discovery
  • Next steps
  • Q & A

Factors

  • No licensing cost
  • Run anywhere 
  • Job scheduling
  • Dashboard
  • Large community
  • Preferably open source

Big 3

  • ECS
  • Docker Swarm
  • Kubernetes

The Tools

PROS

  • List desired end state, it gets you there
  • Easy to read
  • SSH based
  • Decent Docker support
  • Can always fall back to shell/command

CONS

  • Learning curve 
  • Not updated very quickly
  • Bugs are slow to be fixed

Good Alternatives: Terraform

PROS

  • Sets up k8s for you
  • Drop in DNS support
  • Access control
  • Pretty host visualizations
  • Handy kubectl access
  • Catalog
  • Community/support
  • Webhooks

CONS

  • Non standard k8s install
  • Quickly changing
  • Not commonly used in k8s community yet

Good Alternatives: GCE, kops

PROS

  • No setup

CONS

  • Costs money

Good Alternatives: Loggly, ELK stack

PROS

  • Open source
  • Adopted by CNCF
  • Most components expose prometheus data (Docker, k8s, rancher, etc)
  • Can be used to monitor services also
  • The hot new thing
  • See docker details here and here

CONS

  • Still fairly new
  • Less than extensive documentation
  • Config and management

Good Alternatives: New Relic, DataDog, others

PROS

  • Fairly cheap
  • Call scheduling

CONS

  • Fewer integrations compared to PagerDuty, HipChat
  • UX is just "okay"

Good Alternatives: PagerDuty, Slack, HipChat

PROS

  • Simple
  • Deployed with Prometheus

CONS

  • Too early to tell

Good Alternatives: Kibana, Cloudwatch

PROS

  • Solid hosting
  • Vulnerability detection
  • Robot accounts

CONS

  • Have to use "quay.io" prefix

Good Alternatives: Docker Hub

PROS

  • Cheaper than ELB
  • Nicely integrated with AWS Auto Scaling Groups

CONS

  • Vendor tie in

Good Alternatives: HAProxy, nginx

Amazon Application Load Balancer

PROS

  • Integrated with AWS ALBs
  • Nice API for creating routes

CONS

  • ?

Good Alternatives: ?

PROS

  • Large community
  • Tons of plugins
  • Lots of existing expertise in many organizations

CONS

  • Long in the tooth UX

Good Alternatives: Travis CI, Drone.io, lots

Summary

  • Clustering Kubernetes
  • Automation Ansible
  • Cluster management Rancher
  • Logging Sumologic
  • Monitoring Prometheus/New Relic/Sumologic
  • Alerting Alertmanager/Victorops
  • Dashboards Grafana, AWS Cloudwatch, Sumologic
  • Registry Quay.io
  • Load balancing/SSL termination Amazon ALB, Rancher Ingress Controller
  • External DNS Route53
  • CI/CD Jenkins

Other case studies

CLUSTER!

Provisioning

  • Automate as much as possible
  • Create rancher
  • Setup environment in rancher
  • Create Auto Scaling Group that joins rancher environment
  • Wait for k8s to be created

Networking

  • All requests hit ALB
  • ALB -> Rancher ingress controller
  • Rancher ingress controller -> Services
  • Services -> Pods
  • Link to diagram

Logging

  • Create Sumologic hosted HTTP collector
  • Kubelet automatically creates symlink to log files with extra metadata in titles
  • Mount symlink location into containers running https://github.com/3DSIM/fluentd-kubernetes-sumologic
  • Run logging containers as DaemonSet

Monitoring/Alerting

Rolling Updates

  • Kubernetes takes care of everything
  • Just use ansible to create updated deployment
  • See jenkins.3dsim.com

Service Discovery

Next steps

Next steps

Questions?

A Case Study of Docker in Production: Clustering, updates, discovery, logging, and more

By Ryan Walls

A Case Study of Docker in Production: Clustering, updates, discovery, logging, and more

This is a case study of how 3DSIM uses Docker in production. Your needs may vary.

  • 2,112