Managing Microservices Effectively

Daniel Hall (@smarthall)

About Me

  • Systems Engineer at LIFX
  • Making the 'Internet' in the Internet of Things
  • Wrote Ansible Configuration Management by Packt
  • Currently updating that book for a second edition
  • Wrote RatticDB a team based open source password manager

About This Talk

  • This is how we do things at LIFX
  • Feel free to ask questions as we go
  • It works for us, it might not work for you
  • Think about how each bit fits into your situation

Step One: Write your apps

  • You may not get input into this part
  • Micro services are popular at the moment
  • Design pattern that works with continuous delivery

Microservices

  • Try to keep as much state outside your apps
  • Don't make them too small, they're not nanoservices
  • Don't make them too big, they're not milliservices
  • Each service should be
    • Replacable
    • Independently Deployable
    • Have a single capability (billing, authentication)
  • Think about information flow and circular dependencies

The Hype Curve

Jeremy Kemp CC-BY-SA

(http://commons.wikimedia.org/wiki/File:Gartner_Hype_Cycle.svg)

Microservices

Step Two: Packaging

  • All dependencies need to be available
  • Needs to be small or cachable
    • Faster install means faster deployments
  • You might want multiple versions on the same machine
  • Preferably it works in several environments

Docker

  • Filesystem layers stacked on top of each other
  • Uses Linux containers to isolate applications
  • You can run a local Docker registry
    • Security
    • Speed
  • You can run it locally in dev and on your servers
  • Less of 'it works on my laptop'
  • Minuscule performance hit compared to VMs

The Hype Curve

Jeremy Kemp CC-BY-SA

(http://commons.wikimedia.org/wiki/File:Gartner_Hype_Cycle.svg)

Docker

Step Three: Deployment

  • As fast as possible
  • Preferably minimal interaction
  • Recovery from failures

Mesos/Marathon

  • Mesos manages tasks running on a cluster
  • Marathon coordinates long running jobs
  • You submit a JSON job description to Marathon
  • Marathon handles switching from the old app to new
  • Marathon will also handle task failure and recovery
  • Health checks ensure broken tasks get replaced

The Hype Curve

Jeremy Kemp CC-BY-SA

(http://commons.wikimedia.org/wiki/File:Gartner_Hype_Cycle.svg)

Mesos/Marathon

Extra Credit: Scheduling

  • Made by AirBnB
  • Some things need to run repeatedly
  • Cron works, but its not really HA
  • HA Crons exist but can be complex
  • Your cluster probably has spare capacity

Chronos

  • Chronos runs your scheduled tasks in Mesos
  • Uses ISO8601 intervals to specify schedules
  • Use your spare capacity for repeating tasks
  • Can rerun failing jobs
  • Can handle job dependencies
  • Records stats on run times for jobs

The Hype Curve

Jeremy Kemp CC-BY-SA

(http://commons.wikimedia.org/wiki/File:Gartner_Hype_Cycle.svg)

Chronos

How we did it at LIFX

  • All our services are stateless
  • This made them all easy to Dockerise
  • Mesos manages the resources
  • Zookeeper helps Mesos choose a master
  • Marathon makes sure daemons are running
  • Chronos runs scheduled and repeating jobs
  • Databases and other things storing state run outside

What does this look like?

Finding Things

  • You have lots of microservices
  • Marathon keeps moving them
  • Whole machines are going up and down
  • Where is this API running?
  • Which copy of the API do I connect to?

Service Discovery

  • etcd, consul, synapse
  • Marathon comes with an example
    • Marathon knows where things are running
    • Uses HAProxy as load balancer to serivices
    • You run HAProxy on every slave and configure everything to use localhost
    • Not always perfect
  • We use a custom script
    • HTTP routing by putting hostnames in marathon metadata

Collecting Logs

  • Docker currently has no great logging solution
  • You can mount /dev/log but don't restart rsyslog
  • Mesos collects stdout, stderr
    • No easy way to access it
    • No timestamps
  • Correlating logs is great for debugging

Centralised Logs

  • Make rsyslog log to 127.0.0.1
  • Configure a queue to store messages, but drop if full
  • Mount /dev/log into the container
    • You'll need systemd
  • Run several marathon logstash tasks
  • Run elasticsearch on mesos (or seperately)
  • Setup a few small nginx tasks running Kibana
  • TADA! Centralised fault tolerant logs

What that looks like

Troubleshooting

  • Similar to the service discovery problem
  • Breaking in is easier than breaking out
  • Logs inside the image can be hard to get to

Troubleshooting Techniques

  • Find a container in Marathon
  • Use docker exec to run a shell in the container
    • Old versions of docker can use nsenter
    • This won't work for a single executable container
    • You also need tools in there
  • Some debugging tools work from outside
    • pprof for Go
    • jconsole for Java
    • gdb, strace for almost anything

Demo

https://github.com/smarthall/ansible-mesos

Demo Time!

  • All the code is on Github
    • https://github.com/smarthall/ansible-mesos
  • 'vagrant up' will give you a development cluster
  • './init-cluster.sh' will add some sample apps

Thankyou

Any Questions?

Managing Microservices Effectively - Microservices Meetup

By Daniel Hall

Managing Microservices Effectively - Microservices Meetup

  • 2,139