
Ryan Walls
@ryanwalls
Director of Software Engineering
3DSIM
...in Production at 3DSIM
A case study
Agenda
- The Decision
- The Tools
- Cluster!
- Provisioning
- Networking
- Logging
- Monitoring/Alerting
- Rolling updates
- Service discovery
- Next steps
- Q & A

Factors
- No licensing cost
- Run anywhere
- Job scheduling
- Dashboard
- Large community
- Preferably open source
Big 3
- ECS
- Docker Swarm
- Kubernetes

The Tools

PROS
- List desired end state, it gets you there
- Easy to read
- SSH based
- Decent Docker support
- Can always fall back to shell/command
CONS
- Learning curve
- Not updated very quickly
- Bugs are slow to be fixed

Good Alternatives: Terraform
PROS
- Sets up k8s for you
- Drop in DNS support
- Access control
- Pretty host visualizations
- Handy kubectl access
- Catalog
- Community/support
- Webhooks
CONS
- Non standard k8s install
- Quickly changing
- Not commonly used in k8s community yet

PROS
- No setup
CONS
- Costs money

PROS
- Open source
- Adopted by CNCF
- Most components expose prometheus data (Docker, k8s, rancher, etc)
- Can be used to monitor services also
- The hot new thing
- See docker details here and here
CONS
- Still fairly new
- Less than extensive documentation
- Config and management
Good Alternatives: New Relic, DataDog, others

PROS
- Fairly cheap
- Call scheduling
CONS
- Fewer integrations compared to PagerDuty, HipChat
- UX is just "okay"
Good Alternatives: PagerDuty, Slack, HipChat

PROS
- Simple
- Deployed with Prometheus
CONS
- Too early to tell
Good Alternatives: Kibana, Cloudwatch

PROS
- Solid hosting
- Vulnerability detection
- Robot accounts
CONS
- Have to use "quay.io" prefix
Good Alternatives: Docker Hub

PROS
- Cheaper than ELB
- Nicely integrated with AWS Auto Scaling Groups
CONS
- Vendor tie in
Good Alternatives: HAProxy, nginx

Amazon Application Load Balancer
PROS
- Integrated with AWS ALBs
- Nice API for creating routes
CONS
- ?
Good Alternatives: ?

PROS
- Large community
- Tons of plugins
- Lots of existing expertise in many organizations
CONS
- Long in the tooth UX
Good Alternatives: Travis CI, Drone.io, lots

Summary
- Clustering Kubernetes
- Automation Ansible
- Cluster management Rancher
- Logging Sumologic
- Monitoring Prometheus/New Relic/Sumologic
- Alerting Alertmanager/Victorops
- Dashboards Grafana, AWS Cloudwatch, Sumologic
- Registry Quay.io
- Load balancing/SSL termination Amazon ALB, Rancher Ingress Controller
- External DNS Route53
- CI/CD Jenkins
Other case studies
CLUSTER!

Provisioning
- Automate as much as possible
- Create rancher
- Setup environment in rancher
- Create Auto Scaling Group that joins rancher environment
- Wait for k8s to be created
Networking
- All requests hit ALB
- ALB -> Rancher ingress controller
- Rancher ingress controller -> Services
- Services -> Pods
- Link to diagram
Logging
- Create Sumologic hosted HTTP collector
- Kubelet automatically creates symlink to log files with extra metadata in titles
- Mount symlink location into containers running https://github.com/3DSIM/fluentd-kubernetes-sumologic
- Run logging containers as DaemonSet
Monitoring/Alerting
- Forked CoreOS Kube Prometheus
- Deploy using bash script in repo
- Modify configuration on the fly using Ansible
Rolling Updates
- Kubernetes takes care of everything
- Just use ansible to create updated deployment
- See jenkins.3dsim.com
Service Discovery
- Kubernetes sets up everything. See https://kubernetes.io/docs/user-guide/services/#discovering-services
- Not using in our architecture - everything talks through Tyk
- But can verify using dig.
-
apt-get install dnsutils
-
dig +search organization-api
-
- Or environment variables
-
echo $ORGANIZATION_API_PORT
-
Next steps

Next steps
- Job scheduling
- Prometheus metrics and grafana dashboards for individual services
- Update ingress role to return 404 if no host matches. Currently returns nginx welcome page.
- Horizontal Pod Autoscaling
- True zero downtime deploys
- Federation?
Questions?
A Case Study of Docker in Production: Clustering, updates, discovery, logging, and more
By Ryan Walls
A Case Study of Docker in Production: Clustering, updates, discovery, logging, and more
This is a case study of how 3DSIM uses Docker in production. Your needs may vary.
- 2,112