Prometheus
Lessons learned @ ShuttleCloud
- Let Prometheus pull
- Pushing is a real anti-pattern
Service discovery
- Use service discovery
- i.e. DNS service discovery
- Using Ansible inventory works as well
Counters vs Gauges
- Use counters when possible
- Let Prometheus do the aggregating
- Gauges might be problematic considering alerts and decay
- Seems we need Gauges for metrics of instrumented code i.e. profiling
Alerting
- Prediction based threshold alerting
- Alert tree
- Pagerduty integration
- Maybe multiple Prometheus servers for different systems?
Monitored Infrastructure
- 200 systems monitored
- Uses 30GB of data (15d retention)
- 1 meta monitoring instance
- Monitoring implemented by team of 7 engineers
Playing Port Authority
TDD for Containers
# spec/Dockerfile_spec.rb
require "serverspec"
require "docker"
describe "Dockerfile" do
before(:all) do
image = Docker::Image.build_from_dir('.')
set :os, family: :debian
set :backend, :docker
set :docker_image, image.id
end
it "installs the right version of Ubuntu" do
expect(os_version).to include("Ubuntu 14")
end
def os_version
command("lsb_release -a").stdout
end
end
Docker Security HandsOn
- Docker Security == Linux Security
- Secure Docker containers with standard Linux tools for securing Linux processes
seccomp
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"name": "accept",
"action": "SCMP_ACT_ALLOW",
"args": []
},
{
"name": "accept4",
"action": "SCMP_ACT_ALLOW",
"args": []
},
...
]
}
$ docker run --rm -it --security-opt seccomp=/path/to/seccomp/profile.json hello-world
Capabilities
docker run --cap-drop ALL --cap-add SYS_TIME ntpd /bin/sh
Defaults:
CHOWN, DAC_OVERRIDE, FSETID, FOWNER, MKNOD,
NET_RAW, SETGID, SETUID, SETFCAP, SETPCAP,
NET_BIND_SERVICE, SYS_CHROOT, KILL, AUDIT_WRITE
User Namespaces
# Create a user called "dockremap"
$ sudo adduser dockremap
# Setup subuid and subgid
$ sudo sh -c 'echo dockremap:500000:65536 > /etc/subuid'
$ sudo sh -c 'echo dockremap:500000:65536 > /etc/subgid'
$ sudo vim /etc/init.d/docker
:
:
/usr/local/bin/docker daemon --userns-remap=default
Cgroups
$ docker run -d \
--name='low_prio' \
--cpuset-cpus=0 \
--cpu-shares=20 \
busybox md5sum /dev/urandom
$ docker run -d \
--name='high_prio' \
--cpuset-cpus=0 \
--cpu-shares=80 \
busybox md5sum /dev/urandom
Apparmor
$ docker run --rm -it --security-opt apparmor=docker-default hello-world
DevOps
Lean Software Manufacturing
DevSecQaBizOps
1 Culture
2 Processes
3 Tools
Problems arise from misaligned incentives of different parties
Visualize your work
Scientific Experimentation
-
Lean
-
Safety Culture
-
Learning Organization
Dev(Sec)Ops
Top 5 ways
- Culture starts at the Top
- Security awareness among devs begins on day 1
- Keep policies clear, minimal and forceful
- Minimize: KISS
- Test everything
Top 5 fails
- Do nothing
- Wait until everything figured out
- Don't have sec guys in planning
- Fail at basics
- Ineffective or no policies
Solution
- Establish cultural awareness
- Use existing tools and build new ones once we understand our needs as developers
1 Culture
2 Processes
3 Tools
organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations
— M. Conway
DevOpsCon 2016
By Kevin Wittek
DevOpsCon 2016
- 1,672