# Automating ourselves out of a job with
'##:::'##:'##::::'##:'########::'########:'########::'##::: ##:'########:'########:'########::'######::
##::'##:: ##:::: ##: ##.... ##: ##.....:: ##.... ##: ###:: ##: ##.....::... ##..:: ##.....::'##... ##:
##:'##::: ##:::: ##: ##:::: ##: ##::::::: ##:::: ##: ####: ##: ##:::::::::: ##:::: ##::::::: ##:::..::
#####:::: ##:::: ##: ########:: ######::: ########:: ## ## ##: ######:::::: ##:::: ######:::. ######::
##. ##::: ##:::: ##: ##.... ##: ##...:::: ##.. ##::: ##. ####: ##...::::::: ##:::: ##...:::::..... ##:
##:. ##:: ##:::: ##: ##:::: ##: ##::::::: ##::. ##:: ##:. ###: ##:::::::::: ##:::: ##:::::::'##::: ##:
##::. ##:. #######:: ########:: ########: ##:::. ##: ##::. ##: ########:::: ##:::: ########:. ######::
..::::..:::.......:::........:::........::..:::::..::..::::..::........:::::..:::::........:::......:::
:'#######::'########::'########:'########:::::'###::::'########::'#######::'########:::'######::
'##.... ##: ##.... ##: ##.....:: ##.... ##:::'## ##:::... ##..::'##.... ##: ##.... ##:'##... ##:
##:::: ##: ##:::: ##: ##::::::: ##:::: ##::'##:. ##::::: ##:::: ##:::: ##: ##:::: ##: ##:::..::
##:::: ##: ########:: ######::: ########::'##:::. ##:::: ##:::: ##:::: ##: ########::. ######::
##:::: ##: ##.....::: ##...:::: ##.. ##::: #########:::: ##:::: ##:::: ##: ##.. ##::::..... ##:
##:::: ##: ##:::::::: ##::::::: ##::. ##:: ##.... ##:::: ##:::: ##:::: ##: ##::. ##::'##::: ##:
. #######:: ##:::::::: ########: ##:::. ##: ##:::: ##:::: ##::::. #######:: ##:::. ##:. ######::
:.......:::..:::::::::........::..:::::..::..:::::..:::::..::::::.......:::..:::::..:::......:::
# ContainerSched London, September 2017
## Luke Bond
@lukeb0nd
ContainerSched
control-plane.io
# WHO AM I?
- Co-Founder of *controlplane*, a London-based consultancy focusing on *Kubernetes*,
*security* and *continuous delivery*
- Come and talk to me or Andrew Martin today about *security* and *continuous delivery*
for Kubernetes and containers
- Currently helping the UK Home Office with Kubernetes and security
- Developer turned DevOps engineer
- In recent years:
- Consulting, helping teams release more often with higher quality
- Mostly Node.js and Docker
- Moving further down the stack over the years, now mostly Ops
- Have been working with containers since 2014, when, like so many others,
I built a Docker PaaS
- Hobbies include home-brewing and making headings with figlet
@lukeb0nd
ContainerSched
control-plane.io
# WHO IS THIS TALK FOR?
- Those wondering what operators are; what they're for and what they're not for
- Those who get the concept but unsure what building an operator entails
- Those interested in automation of operations on top of Kubernetes
- Those running stateful services in Kubernetes
- Those who want to know where to start working on operators
- This is an introductory talk - I'm not going to to into too much detail
on the coding side of things
@lukeb0nd
ContainerSched
control-plane.io
# WHAT ARE OPERATORS?
- Maybe you read the CoreOS Etcd Operator announcement blog post
- Maybe you watched some talks by Brandon Philips
- Maybe you listened to Brandon on the Cloudcast episode "Understanding
Kubernetes Operators"
--> But maybe, like me, you were still left scratching your head a bit! <--
@lukeb0nd
ContainerSched
control-plane.io
# WHAT ARE OPERATORS?
There are some obvious things:
- Operators encapsulate operational knowledge in code
- The kind of stuff a sysadmin knows about a service, but automated
- Operators leverage the Kubernetes API and primitives in order to do this
@lukeb0nd
ContainerSched
control-plane.io
# WHAT ARE OPERATORS?
But I was left with a few questions:
- Doesn't Kubernetes already magically look after my services and will
restart and migrate them as necessary?
- Doesn't Kubernetes already have primitives such as StatefulSets and
ReplicaSets to help with this stuff?
- How are these things actually built?
If I was confused about these things then maybe you are too. Hope this helps!
@lukeb0nd
ContainerSched
control-plane.io
# IN THIS TALK
- I aim to answer the questions of the previous slide
- I'll explain the relationship between Operators and Kubernetes primitives
such as StatefulSets, ReplicaSets and Services
- I'll explain the scenarios where those primitives aren't enough- that's where
Operators come in
- I'll give a tour of the tools and repos that will give you a starting point
with operators
- Example use-cases of Operators
@lukeb0nd
ContainerSched
control-plane.io
# THE NICHE FOR OPERATORS - WHAT KUBERNETES DOES AND DOESN'T DO FOR YOU
- Let's say you have a 12-factor web app. Kubernetes will:
- Keep it running; surviving crashes and node failures (ReplicaSets)
- Scale it up and down when you want it to (ReplicaSets)
- Internally load balance traffic to instances (Services)
- Stateless apps can be destroyed, moved and upgraded easily anytime
- Existing Kubernetes primitives are perfect for this
@lukeb0nd
ContainerSched
control-plane.io
# THE NICHE FOR OPERATORS - WHAT KUBERNETES DOES AND DOESN'T DO FOR YOU
- Let's say you have a clustered database, however:
- Can't be rescheduled on any host like stateless services
- Instances need to stay with their data
- Scaling may not be as simple as adding more nodes
- Specialist knowledge is required to effectively manage and operate each
database
@lukeb0nd
ContainerSched
control-plane.io
# COREOS OPERATOR ANNOUNCEMENTS
> A Site Reliability Engineer (SRE) is a person that operates an application
> by writing software. They are an engineer, a developer, who knows how to
> develop software specifically for a particular application domain. The
> resulting piece of software has an application's operational domain
> knowledge programmed into it.
> We call this new class of software Operators. An Operator is an
> application-specific controller that extends the Kubernetes API to create,
> configure, and manage instances of complex stateful applications on behalf
> of a Kubernetes user. It builds upon the basic Kubernetes resource and
> controller concepts but includes domain or application-specific knowledge to
> automate common tasks.
-> -- Brandon Philips, "Introducing Operators", CoreOS blog November 3 2016
@lukeb0nd
ContainerSched
control-plane.io
# OPERATORS IN THE WILD
- The Etcd operator was the first
- Released when the operator pattern was introduced/announced
- Prometheus Operator, from CoreOS
- In beta
- Automated deployment and management of Prometheus instances
- Rook - an orchestrator for cloud-native distributed storage systems
- Installs as an operator, registering custom resources in Kubernetes
- Create clusters via the operator
- Tectonic Operator, also from CoreOS
- Everything in Tectonic is automated, from Container Linux to Etcd to Kubernetes
@lukeb0nd
ContainerSched
control-plane.io
# THE ETCD OPERATOR
The Etcd operator is a good place to start to see how they work
It has the following features:
- Create/Destroy
- Resize
- Backup
- Upgrade
It operates using the model: Observe, Analyse and Act
> https://coreos.com/blog/introducing-the-etcd-operator.html#how-it-works
@lukeb0nd
ContainerSched
control-plane.io
# THE ETCD OPERATOR
What is it doing under the hood?
- Registering a custom resource on startup: Etcd Cluster
- Formerly TPR, now CRD
- Listens to Etcd for CRUD events on that API resource
- Acts on those events to affect the cluster
- Can be asked to perform certain operations, e.g. backup
@lukeb0nd
ContainerSched
control-plane.io
# CREATING OPERATORS
CoreOS have published some guidelines for creating operators:
> https://coreos.com/blog/introducing-operators.html#how-can-you-create-an-operator
The Etcd codebase can be seen as a reference implementation of these guidelines.
> https://github.com/coreos/etcd-operator
Let's have a look at how you can create these with `kubectl`
@lukeb0nd
ContainerSched
control-plane.io
# BUILDING OPERATORS
- This year, TPR became CRD. This blog posts explains the changes:
> https://coreos.com/blog/custom-resource-kubernetes-v17
- Custom resources allow you to create your own resource types that you can
manage and interact with in the same way that you can services, pods, secrets,
etc. (i.e. with `kubectl`)
- Let's pretend we're creating a chaos-monkey operator. You've heard of
`StatefulSets`, now we have `HatefulSets`! **groan**
- This is quite a contrived example, it isn't operating a stateful service,
but I just want to show how they're created
@lukeb0nd
ContainerSched
control-plane.io
# BUILDING OPERATORS
- This is what the custom resource definition looks like:
$ cat hatefulset-crd.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: hatefulsets.control-plane.io
spec:
group: control-plane.io
version: v1
names:
kind: HatefulSet
plural: hatefulsets
scope: Namespaced
@lukeb0nd
ContainerSched
control-plane.io
# BUILDING OPERATORS
- And here's what a HatefulSet resource might look like:
$ cat chaos-monkey.yaml
apiVersion: control-plane.io/v1
kind: HatefulSet
metadata:
name: chaos-monkey
namespace: default
spec:
chaosLevel: 10
interval: 300
@lukeb0nd
ContainerSched
control-plane.io
# BUILDING OPERATORS
- Here is how we register the custom resource:
$ kubectl create -f hatefulset-crd.yaml
customresourcedefinition "hatefulsets.control-plane.io" created
$ kubectl get customresourcedefinitions
NAME KIND
hatefulsets.control-plane.io CustomResourceDefinition.v1beta1.apiextensions.k8s.io
- ...and create an initial instance of it:
$ kubectl create -f chaos-monkey.yaml
hatefulset "chaos-monkey" created
$ kubectl get hatefulsets
NAME KIND
chaos-monkey HatefulSet.v1.control-plane.io
- You can read more about custom reources and controllers here:
> https://kubernetes.io/docs/concepts/api-extension/custom-resources
@lukeb0nd
ContainerSched
control-plane.io
# BUILDING OPERATORS
- We've just registered a new resource type and created an instance of it
- That second step generated a `CREATED` event for resource type HatefulSet
- Next we need to write code to watch Etcd to hear of these events
- And also delete, and update
- At this point we have the basis of an Operator's _Observe, Analyse, Act_ cycle
--> But there is a lot of boilerplate to define data model and watch Etcd <--
@lukeb0nd
ContainerSched
control-plane.io
# BUILDING OPERATORS
- Starting from this example code will give you a head start:
> https://github.com/kubernetes/apiextensions-apiserver/tree/master/examples/client-go
This example shows:
- How to register a new custom resource (custom resource type) using a CustomResourceDefinition
- How to create/get/list instances of your new resource type (update/delete/etc work as well
but are not demonstrated)
- How to setup a controller on resource handling create/update/delete events
@lukeb0nd
ContainerSched
control-plane.io
# BUILDING OPERATORS
There are code generators to help you here.
> https://github.com/kubernetes/gengo
Documentation is scant. Until that's improved you're on your own figuring it all out.
See James Munnelly's excellent talk on the subject here for more details:
> https://skillsmatter.com/skillscasts/10599-wrangling-kubernetes-api-internals
@lukeb0nd
ContainerSched
control-plane.io
# EXAMPLE USE CASES OF OPERATORS
- Anything with application-specific operational/maintenance tasks
- Databases are the obvious choice
- Postgres
- Redis
- Mongo
- etc.
- Also "legacy" or non cloud-native applications
- That old stateful Java enterprise monolith on which your business still
depends
- Apps that don't like to be moved without some manual intervention
@lukeb0nd
ContainerSched
control-plane.io
# KUBERNETES-NATIVE APPLICATIONS
There is another class of applications for which we don't yet have a name,
that extend the Kubernetes API (and therefore declare custom resources that
can be managed with `kubectl`), yet don't specifically operate stateful
services.
I'm calling these "Kubernetes-native applications", and have a few advantages
over just running as an application on Kubernetes like any other.
- Can be discovered via the Kubernetes API
- Can declare custom resources that can be managed via `kubectl`
- Can leverage Kubernetes' RBAC for access to their API
This last item alone is probably enough to make them worthwhile as an ops tool
@lukeb0nd
ContainerSched
control-plane.io
# KUBERNETES-NATIVE APPLICATIONS
What kind of Kubernetes-native applications could be useful?
- Chaos-monkey operator with an API to trigger and configure
- System acceptance tests that run inside the cluster, spin up different
pods at different versions and test contracts between them
- The k8s-native equivalent of the trusty Bash/Fig combo!
- Security compliance pod that will try to do things it shouldn't be able to
do inside a cluster, for use in CI
- An operator to steward releases with gradual roll-out
- Something like this, but as an opetator:
> https://github.com/controlplaneio/theseus
- Many other things, including a few things that we're working on at *controlplane*
@lukeb0nd
ContainerSched
control-plane.io
# AUTOMATING OURSELVES OUT OF A JOB
- Anything you can do with kubectl you can do with Operators
- With auth, from inside the cluster
- This is what I'm advocating: we now have the best place to run our tools
- In code, rather than with Bash calling kubectl
- Not that there is anything wrong with that
- Everything centralised, a single pane of glass to view it all
- If you dump and move your cluster state elsewhere, these tools go with it
@lukeb0nd
ContainerSched
control-plane.io
# A FUTURE WITH OPERATORS
- Operators are a step towards fully automated infrastructure
- Self-operating and self-healing systems and infrastructure already exist
- IaaS, Docker and Kubernetes enable a revolution in this space
- The building blocks are now in place to make this possible
- The emergence of the Operator pattern is an early attempt at a standard way
to build self-managing systems on top of Kubernetes
- People are still keeping databases out of Kubernetes
- I think we're running out of excuses to do this
- Automate all the things and run them as operators or k8s-native applications in the cluster!
@lukeb0nd
ContainerSched
control-plane.io
-> # Thanks!!
-> ## Any questions?
-> [Control Plane](https://control-plane.io)
-> [@controlplaneio](https://twitter.com/controlplaneio)
Come and talk to us about *security* and *continuous delivery* for Kubernetes and containers!
-> Slides can be found here:
> https://github.com/lukebond/containersched-london-operators-20170928
@lukeb0nd
ContainerSched
control-plane.io
Operators - ContainerSched London 2017
By Luke Bond
Operators - ContainerSched London 2017
- 957