Building container images for data analysis
Johannes Köster
HPCW 2020
dataset
results
dataset
dataset
Define software stacks.
Build container images.
Use images for execution.
Issue:
Overhead, explosion of image variants.
Workarounds:
- not using containers (🗲 reproducibility)
- no fine-grained containers (🗲 transparency)
Conda package manager
- language agnostic
- thousands of available packages from all fields
- de-facto standard in data sciences
channels:
- conda-forge
dependencies:
- matplotlib =3.1.2
- seaborn =0.10.1
- scikit-learn =0.23.1
- python =3.8.1
Using conda package manager for
building blocks
Conda environment
definitions:
build
Building container images for data analysis
By Johannes Köster
Building container images for data analysis
HPCW 2020
- 1,429