Multi-staged

Docker Builds

About Me

Devops @ MindTickle

yashmehrotra.com

github.com/yashmehrotra

@yashm95

I like tinkering with systems

Devops @ MindTickle

Infrastructure management & all periphery involved
All it takes from developer's code to reach production (CI/CD)
Advocate cloud services to architect better solutions

How we use Docker

More than 90% of our platform is powered by Kubernetes
We also leverage ECS (Elastic container service) and Fargate in a few places
Apart from that, many teams use docker internally for local development for it's portability, platform independency & ease of use

Why bigger images are a problem

Disk maybe cheap, but it ain't free
Local development with docker will take more space
Lesser the size, the faster it will be to pull the image

Why bigger images are a problem

Faster pulls with lead to faster deployments
In kubernetes, if a sudden surge of traffic comes, your autoscaling will be faster as nodes which don't have the cached image will pull the image faster

Is there any solution to all these problems ?

As per docker's official documentation:

One of the most challenging things about building images is keeping the image size down.
Each instruction in the Dockerfile adds a layer to the image
You have to clean up any artifacts you don’t need before moving on to the next layer.

Is there any solution to all these problems ?

As per docker's official documentation:

To write a really efficient Dockerfile, you have to employ shell tricks and other logic to keep the layers as small as possible
And always ensure that each layer has the artifacts it needs from the previous layer and nothing else.

Is there any solution to all these problems ?

Multi-Staged Builds !

Multi-stage builds are a new feature introduced in Docker 17.05

They are useful to anyone who has struggled to optimize Dockerfiles while keeping them easy to read and maintain.

How Docker works in a nutshell

Docker works on UnionFS
Union file systems operate by creating layers, making them very lightweight and fast.
Docker Engine uses UnionFS to provide the building blocks for containers

How Docker works in a nutshell

Basically, each diff in the file system is a new layer
Commands like ENV & EXPOSE don't take up space
Each layer takes up space
For making our image as light-weight as possible:
- Keep only the required files in the image
- Don't create unnecessary layers

Intro to Multi-staged builds

// main.go
package main

import "github.com/gin-gonic/gin"

func main() {
	r := gin.Default()
	r.GET("/ping", func(c *gin.Context) {
		c.JSON(200, gin.H{
			"message": "pong",
		})
	})
	r.Run() // listen and serve on 0.0.0.0:8080
}

Intro to Multi-staged builds

FROM golang:latest as base
COPY . /go/src/github.com/org/helloworld
WORKDIR /go/src/github.com/org/helloworld
RUN go get -u github.com/gin-gonic/gin
ENV CGO_ENABLED 0
RUN go build -o HelloWorld main.go

FROM scratch
COPY --from=base /go/src/github.com/org/helloworld/HelloWorld \
                 /usr/bin/HelloWorld
EXPOSE 8080
ENTRYPOINT ["HelloWorld"]

LIVE DEMO !

Lets build a hello world image for golang using both multi-staged and normal builds

Multi-staged builds are tricky for dynamic languages

Unlike compiled languages, which just need a binary, dynamic languages require the runtime as well
Apart from just the language runtime, it could be runtime dependencies as well
We need to copy those too ...

Multi-staged builds for Python

# app.py
from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

# requirements.txt
Flask==1.0.2
gunicorn==19.9.0

Multi-staged builds for Python

Building a multi-stage build is not as direct in python compared to golang.
In python, you need your dependencies to be available at runtime.
Since gunicorn is an executable here, we need to copy that too along with our python executable

Multi-staged builds for Python

FROM python:3.6-alpine as base
RUN apk update && apk add build-base

COPY . /code/
WORKDIR /code
RUN pip install -r requirements.txt

FROM python:3.6-alpine

COPY --from=base /code/ /code
COPY --from=base /usr/local/lib/python3.6 /usr/local/lib/python3.6
COPY --from=base /usr/local/bin/gunicorn /usr/local/bin/gunicorn
WORKDIR /code

ENTRYPOINT ["gunicorn", "-b 0.0.0.0:8000", "app:app"]

Comparison of image sizes for our microservices

Name	Old Size	New Size
Jafa (Node)	966 MB	199 MB
Bailey (Node)	943 MB	176 MB
MT Login Provider (Golang)	705 MB	16 MB

Other ways to reduce image size

Use slim images from docker hub
If you can compromise on performance, alpine base images can also be used
Separate out compile-time and run-time dependencies
Add compile-time dependencies in builder image
If you have assets (images, binaries) in your code-base, you can remove them as well

Key Takeaways

Keep in mind is that using multi-stage will not impact the build time of the image
Multi-stage builds are useful where space is a constraint, and whilst it is always better to build small concise containers, it is easy to get carried away trying to shave off a few megabytes.
Even though they are great to use, they shouldn’t be abused, the effort should always spent be towards improving the workflow.

If you want pre-built boilerplates

Golang: https://github.com/MindTickle/devops-grpc-go-boilerplate

Scala (with Play Framework): https://github.com/MindTickle/devops-scala-play-boilerplate

Python: https://github.com/MindTickle/devops-python-boilerplate

Thank You!

yashmehrotra.com

github.com/yashmehrotra

@yashm95

Multi-staged docker builds

By Yash Mehrotra

Multi-staged docker builds

Multi-staged

Docker Builds

About Me

Devops @ MindTickle

How we use Docker

Why bigger images are a problem

Why bigger images are a problem

Is there any solution to all these problems ?

Is there any solution to all these problems ?

Is there any solution to all these problems ?

How Docker works in a nutshell

How Docker works in a nutshell

Intro to Multi-staged builds

Intro to Multi-staged builds

LIVE DEMO !

Multi-staged builds are tricky for dynamic languages

Multi-staged builds for Python

Multi-staged builds for Python

Multi-staged builds for Python

Comparison of image sizes for our microservices

Other ways to reduce image size

Key Takeaways

If you want pre-built boilerplates

Thank You!

Multi-staged docker builds

More from Yash Mehrotra