Ryan Walls

@ryanwalls

Director of Software Engineering

3DSIM

...in Production at 3DSIM

A case study

Repo: https://github.com/ryanwalls/meetup-docker-in-production

Agenda

The Decision
The Tools
Cluster!
- Provisioning
- Networking
- Logging
- Monitoring/Alerting
- Rolling updates
- Service discovery
Next steps
Q & A

Factors

No licensing cost
Run anywhere
Job scheduling
Dashboard
Large community
Preferably open source

Big 3

ECS
Docker Swarm
Kubernetes

The Tools

PROS

List desired end state, it gets you there
Easy to read
SSH based
Decent Docker support
Can always fall back to shell/command

CONS

Learning curve
Not updated very quickly
Bugs are slow to be fixed

Good Alternatives: Terraform

PROS

Sets up k8s for you
Drop in DNS support
Access control
Pretty host visualizations
Handy kubectl access
Catalog
Community/support
Webhooks

CONS

Non standard k8s install
Quickly changing
Not commonly used in k8s community yet

Good Alternatives: GCE, kops

PROS

No setup

CONS

Costs money

Good Alternatives: Loggly, ELK stack

PROS

Open source
Adopted by CNCF
Most components expose prometheus data (Docker, k8s, rancher, etc)
Can be used to monitor services also
The hot new thing
See docker details here and here

CONS

Still fairly new
Less than extensive documentation
Config and management

Good Alternatives: New Relic, DataDog, others

PROS

Fairly cheap
Call scheduling

CONS

Fewer integrations compared to PagerDuty, HipChat
UX is just "okay"

Good Alternatives: PagerDuty, Slack, HipChat

PROS

Simple
Deployed with Prometheus

CONS

Too early to tell

Good Alternatives: Kibana, Cloudwatch

PROS

Solid hosting
Vulnerability detection
Robot accounts

CONS

Have to use "quay.io" prefix

Good Alternatives: Docker Hub

PROS

Cheaper than ELB
Nicely integrated with AWS Auto Scaling Groups

CONS

Vendor tie in

Good Alternatives: HAProxy, nginx

Amazon Application Load Balancer

PROS

Integrated with AWS ALBs
Nice API for creating routes

CONS

Good Alternatives: ?

PROS

Large community
Tons of plugins
Lots of existing expertise in many organizations

CONS

Long in the tooth UX

Good Alternatives: Travis CI, Drone.io, lots

Summary

Clustering Kubernetes
Automation Ansible
Cluster management Rancher
Logging Sumologic
Monitoring Prometheus/New Relic/Sumologic
Alerting Alertmanager/Victorops
Dashboards Grafana, AWS Cloudwatch, Sumologic
Registry Quay.io
Load balancing/SSL termination Amazon ALB, Rancher Ingress Controller
External DNS Route53
CI/CD Jenkins

Other case studies

JD.com

CLUSTER!

Provisioning

Automate as much as possible
Create rancher
Setup environment in rancher
Create Auto Scaling Group that joins rancher environment
Wait for k8s to be created

Networking

All requests hit ALB
ALB -> Rancher ingress controller
Rancher ingress controller -> Services
Services -> Pods
Link to diagram

Logging

Create Sumologic hosted HTTP collector
Kubelet automatically creates symlink to log files with extra metadata in titles
Mount symlink location into containers running https://github.com/3DSIM/fluentd-kubernetes-sumologic
Run logging containers as DaemonSet

Monitoring/Alerting

Forked CoreOS Kube Prometheus
Deploy using bash script in repo
Modify configuration on the fly using Ansible

Rolling Updates

Kubernetes takes care of everything
Just use ansible to create updated deployment
See jenkins.3dsim.com

Service Discovery

Kubernetes sets up everything. See https://kubernetes.io/docs/user-guide/services/#discovering-services
Not using in our architecture - everything talks through Tyk

But can verify using dig.

```
apt-get install dnsutils
```
```
dig +search organization-api
```

Or environment variables
- ```
echo $ORGANIZATION_API_PORT
```

Next steps

Job scheduling
Prometheus metrics and grafana dashboards for individual services
Update ingress role to return 404 if no host matches. Currently returns nginx welcome page.
Horizontal Pod Autoscaling
True zero downtime deploys
Federation?

Questions?

A Case Study of Docker in Production: Clustering, updates, discovery, logging, and more

By Ryan Walls

A Case Study of Docker in Production: Clustering, updates, discovery, logging, and more

This is a case study of how 3DSIM uses Docker in production. Your needs may vary.

2,146

Ryan Walls

ryanwalls

...in Production at 3DSIM

A case study

Agenda

Factors

Big 3

The Tools

Summary

Other case studies

CLUSTER!

Provisioning

Networking

Logging

Monitoring/Alerting

Rolling Updates

Service Discovery

Next steps

Next steps

Questions?

A Case Study of Docker in Production: Clustering, updates, discovery, logging, and more

More from Ryan Walls