The Highs and Lows of Early Adoption:
CoreOS in Production
@lukeb0nd
http://yld.io
Luke Bond
CoreOS London November 2014
The Plan
- Project background
- Our current stack
- What's been great
- What's been challenging
- Some tips and recommendations
@lukeb0nd
http://yld.io
"Connected Boilers" project
British Gas: Connected Homes (makers of Hive)
- Lots of data emitted by boilers in the home
- We receive it all via a cloud intermediary
- Currently focused on detecting errors
- Extensible for other functionality in backend
@lukeb0nd
http://yld.io
- Large projected data volume and scale
- JSON all the way
- Data consumed by API and also data science/analytics
It has been an interesting project with interesting challenges, and more or less greenfield.
"Connected Boilers" project
@lukeb0nd
http://yld.io
Project Aims
- Scalable
- More-or-less self-managing:
- Strong monitoring/alerting
- Zero-downtime deployments
- Service discovery
@lukeb0nd
http://yld.io
Project Aims
- Small team of contractors, so:
-
- Minimal human intervention
- Easy for newcomers to pick up
- In short: want to leave behind something easy to manage
Therefore we opted from the beginning for a rigorously tested continuous deployment approach.
@lukeb0nd
http://yld.io
Technologies Used
- Node.js back-end & API (+ a bit of Java)
- AWS: EC2, ELB, EBS
- Couchbase
- Angular web front-end
- Mobile app
- CoreOS, Fleet, Etcd, HAProxy, Confd
- Continuous deployment pipeline:
- Jenkins
- Node.js + LevelDB deployment bot
@lukeb0nd
http://yld.io
What's been great: CoreOS
- From a developer's POV it just works™
- Minimal, largely read-only, no package manager- fewer things to go wrong
- Using the stable branch since a few weeks ago
- We're using "cfndsl", lets you write Cloud Formation templates with Ruby
- A lean OS is perfect for Docker
@lukeb0nd
http://yld.io
CoreOS: updates & restarts
- We began with one-machine-at-a-time restarts
- Fleet got into a bad state once after a big update to it (alpha channel)
- We never figured out what happened
- Now we disable restarts and do planned updates
@lukeb0nd
http://yld.io
CoreOS: how we're using units
Or "units are more than just runners of Docker containers"
- Mount units
- Timer units
- We use these for scheduled backups
- To attach/detach EBS volumes
- One-shot units for administrative/maintenance tasks
- Global one-shot units particularly useful
@lukeb0nd
http://yld.io
CoreOS: how we're using units
Example
$ fleetctl cat jq.service
[Unit]
Description=Install JQ
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/mkdir -p /opt/bin
ExecStart=/usr/bin/curl http://stedolan.github.io/jq/download/linux64/jq -o /opt/bin/jq
ExecStart=/usr/bin/chmod +x /opt/bin/jq
[X-Fleet]
Global=true
@lukeb0nd
http://yld.io
CoreOS: how we're using units
- Other examples:
- A one-shot global unit to set overcommit kernel flag (we wanted it for Redis). We're able to run that only on the high-RAM hosts designated for Redis
- One-time execution of database maintenance- ie. adding views, fixtures, import/export, backup
- We use the `core` user and add our team's SSH keys
@lukeb0nd
http://yld.io
What's been great: Fleet
Cluster presented as a systemd abstraction
Think: systemd of multiple hosts "taped together" to appear as one, using Etcd
Docker makes "abstracting the host" possible, but Fleet delivers it
-
You now think about the cluster, not hosts
I can't imagine going back from this thinking now :)
Very powerful when combined with 12-factor principles
@lukeb0nd
http://yld.io
Fleet as CM solution?
- Configuration-management type actions encapsulated as systemd unit files, run with Fleet
- Can have tag-like functionality with [X-Fleet] conditionals
- It's declarative: "cluster, make this available" rather than "run command X on box Y".
- The systemd users here will be better able to imagine the potential for cluster-wide abstraction of it than I.
@lukeb0nd
http://yld.io
- Personally, I prefer this to use a traditional* CM solution
- Include only bare core in cloud-config
- Write unit files for configuration management and administration
- Check them into Git
- Deploy them on your CoreOS cluster in the same way you do your services
- Fleet will handle running them on new machines
@lukeb0nd
http://yld.io
Fleet as CM solution?
What's been great: Docker
-
Great for development
Fig instead of Vagrant
-
Great for testing
Fig for functional/acceptance tests
`docker build/tag/push` on green
-
Great for deployment
`docker pull`, `docker run`
@lukeb0nd
http://yld.io
What's been great: Docker
I now naturally tend towards smaller, simpler, composable services
-
Docker and Node.js work really well together
-
Single-threaded* event-driven model maps well to the "one process per container" Docker model
(I don't like the supervisor model**, especially when you have systemd)
-
@lukeb0nd
http://yld.io
Challenges: CoreOS
-
CoreOS terminal is a PITA
-
Etcd cluster losing quorum
-
CoreOS don't seem to have a recommended way of replacing an Etcd cluster, or dealing with this issue
-
Separate Etcd cluster?
-
-
Complex ordering systemd units for asynchronous tasks
-
e.g. not starting unit until EBS volume attached
-
Likewise detaching cleanly
-
@lukeb0nd
http://yld.io
Challenges: new technology
It's a dangerous business, Frodo, going out your door. You step onto the road, and if you don't keep your feet, there's no knowing where you might be swept off to.
– J.R.R. Tolkein, The Lord of the Rings
We're using a lot of new technology at once
-
That wasn't the plan, but it was like tugging at a thread!
Or "swallow the spider to catch the fly", at times
@lukeb0nd
http://yld.io
Challenges: Docker networking
May you live in interesting times
– Chinese Curse (apocryphal)
-
I look forward to when Docker networking is solved
@weavenetwork from Zettio looks very promising
But these early days of Docker are exciting times
Beware complex and gnarly service discovery solutions
@lukeb0nd
http://yld.io
Challenges: Docker networking
Tricky to balance keeping it simple with robustness
Complexity and number of moving parts can skyrocket if you're not careful
-
For us, startup-time service discovery not enough
Needed dynamically configured internal load-balancers
Beware potential DNS implications
@lukeb0nd
http://yld.io
Challenges: databases
Mount volumes (of course)
-
Databases don't always like moving hosts
e.g. Couchbase, depending how it's configured
-
When choosing a DB, consider how running in Docker affect that choice.
i.e. how does it cluster/replicate?
Couchbase, Riak, Cassandra compared to MongoDB, CouchDB, MySQL?
@lukeb0nd
http://yld.io
DOs
-
Keep your wits about you
-
Keep it simple
-
Keep services small*
-
Expect services to move hosts
-
Accept limitations in order to simplify
-
Define per-host services via global units rather than in user-data
DON'Ts
Get carried away
Neglect to consider persistent storage
Treat hosts like pets
Stovepiping
Force apps to do anything special in order to work
@lukeb0nd
http://yld.io
Questions?
@lukeb0nd
http://yld.io
CoreOS in Production
By Luke Bond
CoreOS in Production
- 4,222