Prometheus

Lessons learned @ ShuttleCloud

  • Let Prometheus pull
    • Pushing is a real anti-pattern

Service discovery

  • Use service discovery
  • i.e. DNS service discovery
  • Using Ansible inventory works as well

Counters vs Gauges

  • Use counters when possible
  • Let Prometheus do the aggregating
  • Gauges might be problematic considering alerts and decay
  • Seems we need Gauges for metrics of instrumented code i.e. profiling

Alerting

  • Prediction based threshold alerting
  • Alert tree
  • Pagerduty integration
  • Maybe multiple Prometheus servers for different systems?

Monitored Infrastructure

  • 200 systems monitored
  • Uses 30GB of data (15d retention)
  • 1 meta monitoring instance
  • Monitoring implemented by team of 7 engineers

Playing Port Authority
TDD for Containers

# spec/Dockerfile_spec.rb

require "serverspec"
require "docker"

describe "Dockerfile" do
  before(:all) do
    image = Docker::Image.build_from_dir('.')

    set :os, family: :debian
    set :backend, :docker
    set :docker_image, image.id
  end

  it "installs the right version of Ubuntu" do
    expect(os_version).to include("Ubuntu 14")
  end

  def os_version
    command("lsb_release -a").stdout
  end
end

Docker Security HandsOn

  • Docker Security == Linux Security
  • Secure Docker containers with standard Linux tools for securing Linux processes

seccomp

{
	"defaultAction": "SCMP_ACT_ERRNO",
	"architectures": [
		"SCMP_ARCH_X86_64",
		"SCMP_ARCH_X86",
		"SCMP_ARCH_X32"
	],
	"syscalls": [
		{
			"name": "accept",
			"action": "SCMP_ACT_ALLOW",
			"args": []
		},
		{
			"name": "accept4",
			"action": "SCMP_ACT_ALLOW",
			"args": []
		},
		...
	]
}
$ docker run --rm -it --security-opt seccomp=/path/to/seccomp/profile.json hello-world

Capabilities

docker run --cap-drop ALL --cap-add SYS_TIME ntpd /bin/sh
Defaults:

CHOWN, DAC_OVERRIDE, FSETID, FOWNER, MKNOD, 
NET_RAW, SETGID, SETUID, SETFCAP, SETPCAP, 
NET_BIND_SERVICE, SYS_CHROOT, KILL, AUDIT_WRITE

User Namespaces

# Create a user called "dockremap"
$ sudo adduser dockremap

# Setup subuid and subgid
$ sudo sh -c 'echo dockremap:500000:65536 > /etc/subuid'
$ sudo sh -c 'echo dockremap:500000:65536 > /etc/subgid'
$ sudo vim /etc/init.d/docker
 :
 :
/usr/local/bin/docker daemon --userns-remap=default

Cgroups

$ docker run -d \
    --name='low_prio' \
    --cpuset-cpus=0 \
    --cpu-shares=20 \
    busybox md5sum /dev/urandom

$ docker run -d \
    --name='high_prio' \
    --cpuset-cpus=0 \
    --cpu-shares=80 \
    busybox md5sum /dev/urandom

Apparmor

$ docker run --rm -it --security-opt apparmor=docker-default hello-world

DevOps

Lean Software Manufacturing

DevSecQaBizOps

1 Culture
2 Processes
3 Tools

Problems arise from misaligned incentives of different parties

Visualize your work

Scientific Experimentation

  • Lean

  • Safety Culture

  • Learning Organization

Dev(Sec)Ops

Top 5 ways

  1.  Culture starts at the Top
  2.  Security awareness among devs begins on day 1
  3.  Keep policies clear, minimal and forceful
  4. Minimize: KISS
  5. Test everything

Top 5 fails

  1. Do nothing
  2. Wait until everything figured out
  3. Don't have sec guys in planning
  4. Fail at basics
  5. Ineffective or no policies

Solution

  • Establish cultural awareness
  • Use existing tools and build new ones once we understand our needs as developers

1 Culture
2 Processes
3 Tools

organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations

— M. Conway

DevOpsCon 2016

By Kevin Wittek

DevOpsCon 2016

  • 1,672