Optimizing Docker builds for Python applications
Dmitry Figol
Systems Engineer, Cisco Systems
@dmfigol
Slides
Docker terminology
Container
- A lightweight way to package an application with its dependencies
- Different containers have separate user-space but share the kernel of the host
Docker image
- Template to create Docker containers
- Created from Dockerfile
- Consists of read-only layers
- Can be uploaded to registry and shared with others
Dockerfile
- A set of instructions to build an image
- Starts with a base image
- Every* instruction creates a new layer which is cached for future builds
FROM debian:stretch
COPY test.txt test.txt
RUN touch file.txt
CMD ["date"]
Docker container
- Created from Docker image, a writable layer on top is added
- Resources are allocated
- Entrypoint/CMD are executed at the start of a container
Registry
A place to store and share tagged images
Dockerfile
Dockerfile
Image
build, tag
Registry
push/pull
Container
run
CMD/Entrypoint
Resources (storage, networking, etc.)
Focus of this session
Python + Docker
Simplest Dockerfile for Python app
FROM python:3.7
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "main.py"]
976 Mb
-> % tree .
.
├── Dockerfile
├── main.py
├── my_project
│ ├── __init__.py
│ └── greet.py
├── poetry.lock
├── pyproject.toml
└── requirements.txt
-> % cat requirements.txt
requests
cryptography
Optimization objectives
- Reducing image size
- Reducing initial and subsequent build time
Priorities
- Fast builds during development
- Small image size for production releases
Selecting base image
Image | Size | Notes |
---|---|---|
python:3.7 / python:3.7-stretch | 929 Mb | Uses glibc and supports manylinux wheels |
python:3.7-slim-stretch | 147 Mb | |
python:3.7-alpine | 87 Mb | Uses
musl and does not support
manylinux wheels Python extensions should be compiled
Dependencies take less space |
Use slim-stretch as base
when you care about build time
Use alpine as base
when you care about image size
slim-stretch
FROM python:3.7-slim-stretch
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "main.py"]
976→193 Mb
alpine
FROM python:3.7-alpine
WORKDIR /app
COPY . .
RUN apk add --no-cache \
build-base \
gcc \
libffi-dev \
openssl-dev
RUN pip install -r requirements.txt
CMD ["python", "main.py"]
976→317 Mb
???
Problem
Build dependencies, contributing to the image size, are needed for compilation but not the runtime
Include only necessary files
Copying the source code
- More specific COPY statements instead of broad "COPY . ."
- Use .dockerignore to exclude some files when doing COPY
.dockerignore example
**/*.pyc
**/*.pyo
**/*.log
**/__pycache__
docs/_build
**/.ipynb_checkpoints
.venv/
.mypy_cache/
.pytest_cache/
.tox/
**/*.egg-info
pip-wheel-metadata/
slim-stretch
FROM python:3.7-slim-stretch
WORKDIR /app
COPY my_project my_project
COPY main.py .
COPY requirements.txt .
RUN pip install -r requirements.txt
CMD ["python", "main.py"]
193→170 Mb
alpine
FROM python:3.7-alpine
WORKDIR /app
COPY my_project my_project
COPY main.py .
COPY requirements.txt .
RUN apk add --no-cache \
build-base \
gcc \
libffi-dev \
openssl-dev
RUN pip install -r requirements.txt
CMD ["python", "main.py"]
317→294 Mb
Remove unnecessary files
alpine
FROM python:3.7-alpine
WORKDIR /app
COPY my_project my_project
COPY main.py .
COPY requirements.txt .
RUN apk add --no-cache \
build-base \
gcc \
libffi-dev \
openssl-dev
RUN pip install -r requirements.txt
RUN apk del build-base \
gcc \
libffi-dev \
openssl-dev
CMD ["python", "main.py"]
294→294 Mb
???
Docker Layers
- Instructions create read-only layers
- A new layer can't be smaller than the previous layer
- Layers are cached and can be re-used for subsequent builds
- Layers introduce some overhead
Tips
- Combine multiple RUN statements into a single one
- If you need to delete files, make sure to delete them in the same layer (instruction) where they were added
- To benefit from caching, arrange statements in the order from the least changing to the most changing (usually, system-level dependencies and tools, Python dependencies, source code)
- Don't save dependencies to cache (pip --no-cache-dir option, apk --no-cache option)
slim-stretch
FROM python:3.7-slim-stretch
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY my_project my_project
COPY main.py .
CMD ["python", "main.py"]
170→166 Mb
alpine
FROM python:3.7-alpine
WORKDIR /app
ARG BUILD_DEPS="build-base gcc libffi-dev openssl-dev"
ARG RUNTIME_DEPS="libcrypto1.1 libssl1.1"
COPY requirements.txt .
RUN apk add --no-cache --virtual .build-deps ${BUILD_DEPS} \
&& pip install --no-cache-dir -r requirements.txt \
&& apk del .build-deps \
&& apk add --no-cache ${RUNTIME_DEPS}
COPY my_project my_project
COPY main.py .
CMD ["python", "main.py"]
294→106 Mb
(Optional) Delete *.pyc files / tests from dependencies
FROM python:3.7-alpine
WORKDIR /app
ARG BUILD_DEPS="build-base gcc libffi-dev openssl-dev"
ARG RUNTIME_DEPS="libcrypto1.1 libssl1.1"
COPY requirements.txt .
RUN apk add --no-cache --virtual .build-deps ${BUILD_DEPS} \
&& pip install --no-cache-dir -r requirements.txt \
&& apk del .build-deps \
&& apk add --no-cache ${RUNTIME_DEPS} \
&& find /usr/local \
\( -type d -a -name test -o -name tests \) \
-o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
-exec rm -rf '{}' \+
COPY my_project my_project
COPY main.py .
CMD ["python", "main.py"]
106→97 Mb
Disadvantages
- Complex Dockerfile
- No benefit from layer caching
Docker multi-stage
- Build an intermediate image with all build dependencies and install your application
- Copy the result (e.g. binary) to a fresh image and label it as a final image
Why?
- Resulting image is smaller (no build dependencies)
- Could be faster if the layers with build dependencies are cached
Are multi-stage builds relevant to Python apps?
Somewhat
Python is an interpreted language
Idea: use virtual environments to simplify copy between stages
# Stage 1 - Install build dependencies
FROM python:3.7-alpine AS builder
WORKDIR /app
ARG BUILD_DEPS="build-base gcc libffi-dev openssl-dev"
RUN apk add --no-cache ${BUILD_DEPS} \
&& python -m venv .venv \
&& .venv/bin/pip install --no-cache-dir -U pip setuptools
COPY requirements.txt .
RUN .venv/bin/pip install --no-cache-dir -r requirements.txt \
&& find /app/.venv \
\( -type d -a -name test -o -name tests \) \
-o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
-exec rm -rf '{}' \+
# Stage 2 - Copy only necessary files to the runner stage
FROM python:3.7-alpine
WORKDIR /app
ARG RUNTIME_DEPS="libcrypto1.1 libssl1.1"
RUN apk add --no-cache ${RUNTIME_DEPS}
COPY --from=builder /app /app
COPY my_project my_project
COPY main.py .
ENV PATH="/app/.venv/bin:$PATH"
CMD ["python", "main.py"]
Python + Docker multi-stage
97→101 Mb
Idea: Build a custom image with your common build dependencies and tools and store it in the registry
FROM registry.gitlab.com/dmfigol/base-docker-images/python:3.7-alpine AS builder
WORKDIR /app
COPY requirements.txt .
RUN .venv/bin/pip install --no-cache-dir -r requirements.txt \
&& find /app/.venv \
\( -type d -a -name test -o -name tests \) \
-o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
-exec rm -rf '{}' \+
FROM python:3.7-alpine
WORKDIR /app
ARG RUNTIME_DEPS="libcrypto1.1 libssl1.1"
RUN apk add --no-cache ${RUNTIME_DEPS}
COPY --from=builder /app /app
COPY my_project my_project
COPY main.py .
ENV PATH="/app/.venv/bin:$PATH"
CMD ["python", "main.py"]
Miscellaneous
Bind mount source code instead of COPY in local dev environment
Add the following environmental variables:
-
PYTHONUNBUFFERED=1 # print to stdout without buffering
-
PYTHONDONTWRITEBYTECODE=1 # don't generate *.pyc files
# Stage 1 - Install build dependencies
FROM python:3.7-alpine AS builder
WORKDIR /app
ENV PATH="/root/.poetry/bin:$PATH"
ARG BUILD_DEPS="build-base gcc libffi-dev openssl-dev git curl"
RUN apk add --no-cache ${BUILD_DEPS} \
&& curl -sSL https://raw.githubusercontent.com/sdispater/poetry/master/get-poetry.py | python \
&& python -m venv .venv \
&& poetry config settings.virtualenvs.in-project true \
&& .venv/bin/pip install --no-cache-dir -U pip setuptools
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry install --no-dev --no-interaction \
&& find /app/.venv \
\( -type d -a -name test -o -name tests \) \
-o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
-exec rm -rf '{}' \+
COPY my_project my_project
# Install the project as a package
RUN poetry install --no-dev --no-interaction
# Stage 2 - Copy only necessary files to the runner stage
FROM python:3.7-alpine
WORKDIR /app
ARG RUNTIME_DEPS="libcrypto1.1 libssl1.1"
RUN apk add --no-cache ${RUNTIME_DEPS}
COPY --from=builder /app /app
COPY main.py .
ENV PATH="/app/.venv/bin:$PATH" \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
CMD ["python", "main.py"]
Example of a Dockerfile using Poetry
Summary
- Select base image carefully:
- alpine for smaller image size
- slim-stretch for faster builds
- Take into account layer caching
- Combine different statements into one
- Delete files in the same statement where they were added
- Order statements from the least to the most changing
- Docker multi-stage can help you avoid complex deletions and benefit from caching
- Usage of Python virtual environment is recommended in this case
Thank you!
@dmfigol
Optimizing Docker builds for Python applications
By dmfigol
Optimizing Docker builds for Python applications
- 2,356