Multi-staged

Docker Builds

About Me

Devops @ MindTickle

 

yashmehrotra.com

github.com/yashmehrotra

@yashm95

I like tinkering with systems

Devops @ MindTickle

  • Infrastructure management & all periphery involved
  • All it takes from developer's code to reach production (CI/CD)
  • Advocate cloud services to architect better solutions

How we use Docker

  • More than 90% of our platform is powered by Kubernetes
  • We also leverage ECS (Elastic container service) and Fargate in a few places
  • Apart from that, many teams use docker internally for local development for it's portability, platform independency & ease of use

Why bigger images are a problem

  • Disk maybe cheap, but it ain't free
  • Local development with docker will take more space
  • Lesser the size, the faster it will be to pull the image

Why bigger images are a problem

  • Faster pulls with lead to faster deployments
  • In kubernetes, if a sudden surge of traffic comes, your autoscaling will be faster as nodes which don't have the cached image will pull the image faster

Is there any solution to all these problems ?

As per docker's official documentation:

 

  • One of the most challenging things about building images is keeping the image size down.
  • Each instruction in the Dockerfile adds a layer to the image
  • You have to clean up any artifacts you don’t need before moving on to the next layer.

Is there any solution to all these problems ?

As per docker's official documentation:

 

  • To write a really efficient Dockerfile, you have to employ shell tricks and other logic to keep the layers as small as possible
  • And always ensure that each layer has the artifacts it needs from the previous layer and nothing else.

Is there any solution to all these problems ?

Multi-Staged Builds !

 

Multi-stage builds are a new feature introduced in Docker 17.05

 

They are useful to anyone who has struggled to optimize Dockerfiles while keeping them easy to read and maintain.

How Docker works in a nutshell

  • Docker works on UnionFS
  • Union file systems operate by creating layers, making them very lightweight and fast.
  • Docker Engine uses UnionFS to provide the building blocks for containers

How Docker works in a nutshell

  • Basically, each diff in the file system is a new layer
  • Commands like ENV & EXPOSE don't take up space
  • Each layer takes up space
  • For making our image as light-weight as possible:
    • Keep only the required files in the image
    • Don't create unnecessary layers

Intro to Multi-staged builds

// main.go
package main

import "github.com/gin-gonic/gin"

func main() {
	r := gin.Default()
	r.GET("/ping", func(c *gin.Context) {
		c.JSON(200, gin.H{
			"message": "pong",
		})
	})
	r.Run() // listen and serve on 0.0.0.0:8080
}

Intro to Multi-staged builds

FROM golang:latest as base
COPY . /go/src/github.com/org/helloworld
WORKDIR /go/src/github.com/org/helloworld
RUN go get -u github.com/gin-gonic/gin
ENV CGO_ENABLED 0
RUN go build -o HelloWorld main.go

FROM scratch
COPY --from=base /go/src/github.com/org/helloworld/HelloWorld \
                 /usr/bin/HelloWorld
EXPOSE 8080
ENTRYPOINT ["HelloWorld"]

LIVE DEMO !

Lets build a hello world image for golang using both multi-staged and normal builds

Multi-staged builds are tricky for dynamic languages

  • Unlike compiled languages, which just need a binary, dynamic languages require the runtime as well
  • Apart from just the language runtime, it could be runtime dependencies as well
  • We need to copy those too ...

Multi-staged builds for Python

# app.py
from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

# requirements.txt
Flask==1.0.2
gunicorn==19.9.0

Multi-staged builds for Python

  • Building a multi-stage build is not as direct in python compared to golang.
  • In python, you need your dependencies to be available at runtime.
  • Since gunicorn is an executable here, we need to copy that too along with our python executable

Multi-staged builds for Python

FROM python:3.6-alpine as base
RUN apk update && apk add build-base

COPY . /code/
WORKDIR /code
RUN pip install -r requirements.txt

FROM python:3.6-alpine

COPY --from=base /code/ /code
COPY --from=base /usr/local/lib/python3.6 /usr/local/lib/python3.6
COPY --from=base /usr/local/bin/gunicorn /usr/local/bin/gunicorn
WORKDIR /code

ENTRYPOINT ["gunicorn", "-b 0.0.0.0:8000", "app:app"]

Comparison of image sizes for our microservices

Name Old Size New Size
Jafa (Node) 966 MB 199 MB
Bailey (Node) 943 MB 176 MB
MT Login Provider (Golang)
 
705 MB 16 MB

Other ways to reduce image size

  • Use slim images from docker hub

  • If you can compromise on performance, alpine base images can also be used

  • Separate out compile-time and run-time dependencies

  • Add compile-time dependencies in builder image

  • If you have assets (images, binaries) in your code-base, you can remove them as well

Key Takeaways

  • Keep in mind is that using multi-stage will not impact the build time of the image

  • Multi-stage builds are useful where space is a constraint, and whilst it is always better to build small concise containers, it is easy to get carried away trying to shave off a few megabytes.

  • Even though they are great to use, they shouldn’t be abused, the effort should always spent be towards improving the workflow.

If you want pre-built boilerplates

Golang: https://github.com/MindTickle/devops-grpc-go-boilerplate

 

Scala (with Play Framework): https://github.com/MindTickle/devops-scala-play-boilerplate

Python: https://github.com/MindTickle/devops-python-boilerplate

 

Thank You!

yashmehrotra.com

github.com/yashmehrotra

@yashm95

Multi-staged docker builds

By Yash Mehrotra

Multi-staged docker builds

  • 316