Multi-staged
Docker Builds
About Me
Devops @ MindTickle



yashmehrotra.com
github.com/yashmehrotra
@yashm95
I like tinkering with systems
Devops @ MindTickle
- Infrastructure management & all periphery involved
- All it takes from developer's code to reach production (CI/CD)
- Advocate cloud services to architect better solutions
How we use Docker
- More than 90% of our platform is powered by Kubernetes
- We also leverage ECS (Elastic container service) and Fargate in a few places
- Apart from that, many teams use docker internally for local development for it's portability, platform independency & ease of use
Why bigger images are a problem
- Disk maybe cheap, but it ain't free
- Local development with docker will take more space
- Lesser the size, the faster it will be to pull the image
Why bigger images are a problem
- Faster pulls with lead to faster deployments
- In kubernetes, if a sudden surge of traffic comes, your autoscaling will be faster as nodes which don't have the cached image will pull the image faster
Is there any solution to all these problems ?
As per docker's official documentation:
- One of the most challenging things about building images is keeping the image size down.
- Each instruction in the Dockerfile adds a layer to the image
- You have to clean up any artifacts you don’t need before moving on to the next layer.
Is there any solution to all these problems ?
As per docker's official documentation:
- To write a really efficient Dockerfile, you have to employ shell tricks and other logic to keep the layers as small as possible
- And always ensure that each layer has the artifacts it needs from the previous layer and nothing else.
Is there any solution to all these problems ?
Multi-Staged Builds !
Multi-stage builds are a new feature introduced in Docker 17.05
They are useful to anyone who has struggled to optimize Dockerfiles while keeping them easy to read and maintain.
How Docker works in a nutshell
- Docker works on UnionFS
- Union file systems operate by creating layers, making them very lightweight and fast.
- Docker Engine uses UnionFS to provide the building blocks for containers
How Docker works in a nutshell
- Basically, each diff in the file system is a new layer
- Commands like ENV & EXPOSE don't take up space
- Each layer takes up space
- For making our image as light-weight as possible:
- Keep only the required files in the image
- Don't create unnecessary layers
Intro to Multi-staged builds
// main.go
package main
import "github.com/gin-gonic/gin"
func main() {
r := gin.Default()
r.GET("/ping", func(c *gin.Context) {
c.JSON(200, gin.H{
"message": "pong",
})
})
r.Run() // listen and serve on 0.0.0.0:8080
}
Intro to Multi-staged builds
FROM golang:latest as base
COPY . /go/src/github.com/org/helloworld
WORKDIR /go/src/github.com/org/helloworld
RUN go get -u github.com/gin-gonic/gin
ENV CGO_ENABLED 0
RUN go build -o HelloWorld main.go
FROM scratch
COPY --from=base /go/src/github.com/org/helloworld/HelloWorld \
/usr/bin/HelloWorld
EXPOSE 8080
ENTRYPOINT ["HelloWorld"]
LIVE DEMO !
Lets build a hello world image for golang using both multi-staged and normal builds
Multi-staged builds are tricky for dynamic languages
- Unlike compiled languages, which just need a binary, dynamic languages require the runtime as well
- Apart from just the language runtime, it could be runtime dependencies as well
- We need to copy those too ...
Multi-staged builds for Python
# app.py
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello_world():
return 'Hello, World!'
# requirements.txt
Flask==1.0.2
gunicorn==19.9.0
Multi-staged builds for Python
- Building a multi-stage build is not as direct in python compared to golang.
- In python, you need your dependencies to be available at runtime.
- Since gunicorn is an executable here, we need to copy that too along with our python executable
Multi-staged builds for Python
FROM python:3.6-alpine as base
RUN apk update && apk add build-base
COPY . /code/
WORKDIR /code
RUN pip install -r requirements.txt
FROM python:3.6-alpine
COPY --from=base /code/ /code
COPY --from=base /usr/local/lib/python3.6 /usr/local/lib/python3.6
COPY --from=base /usr/local/bin/gunicorn /usr/local/bin/gunicorn
WORKDIR /code
ENTRYPOINT ["gunicorn", "-b 0.0.0.0:8000", "app:app"]
Comparison of image sizes for our microservices
Name | Old Size | New Size |
---|---|---|
Jafa (Node) | 966 MB | 199 MB |
Bailey (Node) | 943 MB | 176 MB |
MT Login Provider (Golang) |
705 MB | 16 MB |
Other ways to reduce image size
-
Use slim images from docker hub
-
If you can compromise on performance, alpine base images can also be used
-
Separate out compile-time and run-time dependencies
-
Add compile-time dependencies in builder image
-
If you have assets (images, binaries) in your code-base, you can remove them as well
Key Takeaways
-
Keep in mind is that using multi-stage will not impact the build time of the image
-
Multi-stage builds are useful where space is a constraint, and whilst it is always better to build small concise containers, it is easy to get carried away trying to shave off a few megabytes.
-
Even though they are great to use, they shouldn’t be abused, the effort should always spent be towards improving the workflow.
If you want pre-built boilerplates
Golang: https://github.com/MindTickle/devops-grpc-go-boilerplate
Scala (with Play Framework): https://github.com/MindTickle/devops-scala-play-boilerplate
Python: https://github.com/MindTickle/devops-python-boilerplate
Thank You!



yashmehrotra.com
github.com/yashmehrotra
@yashm95
Multi-staged docker builds
By Yash Mehrotra
Multi-staged docker builds
- 521