The MLOps tooling landscape
Tensorflow Belgium
Belgium
The MLOps tooling landscape
The MLOps tooling landscape
is an absolute mess
ljvmiranda921.github.io
The MLOps tooling landscape
is an absolute mess
and a marketing battle
MLOps providers dominate information
Every tool does everything perfectly
Constantly changing: no list can keep up
What can we do?
Exploration phase
Filter phase
Selection phase
Most articles are written by vendors
mlops.toys nice website, but not updated
mlops.community slack channel
Github Awesome Lists
EthicalML/awesome-production-machine-learning
visenger/awesome-mlops
kelvins/awesome-mlops
What can we do?
Exploration phase
Filter phase
Selection phase
https://ml-ops.org/content/mlops-stack-canvas
https://github.com/ai-infrastructure-alliance/blueprints
ml-ops.org Stack Canvas
AI Infrastructure Alliance Blueprints
What can we do?
Exploration phase
Filter phase
Selection phase
https://github.com/ai-infrastructure-alliance/blueprints
AI Infrastructure Alliance Blueprints
What can we do?
Exploration phase
Filter phase
Selection phase
https://github.com/ai-infrastructure-alliance/blueprints
AI Infrastructure Alliance Blueprints
What can we do?
Exploration phase
Filter phase
Selection phase
Try, do not count on the descriptions
Head to head battle
Why is it all so messy?
Why is it all so messy?
In essence, all these tools are trying to solve the challenges you'll encounter when doing ML
"
Premature optimization is the root of all evil
"
~ Ghandi
just kidding it was Donald Ervin Knuth in The Art of Computer Programming, Volume 1: Fundamental Algorithms
What challenges exactly are we solving for anyway?
Challenge 0: Buy or Build
No data scientists?
AutoML Platforms
Low-code
Usually include every step
https://twimlai.com/solutions/introducing-twiml-ml-ai-solutions-guide/
Challenge 1: Data Management
Issues
Level 1
Git LFS
Cloud Bucket
(Cloud) Database
Bigquery
Level 2
DVC
ClearML Data
LakeFS
Dolt
Pachyderm
Level 3
Feature Store
FAIS
Dataset size outgrows personal machines (+backup)
No overview, no metadata, no insights
Data accessibility
Versioning and Lineage
Challenge 2: Prototyping Phase
Issues
Experiment Manager
Weights & Biases
ClearML
MLFlow
Sacred
Guild.AI
Self Labeling
Label Studio
Chaotic by nature, not commit trigger
Track output files as well
Experiment Comparison
Reproducibility
Challenge 3: Remote Compute
Issues
Remote Machine
Every cloud ever
Jupyter / remote VSCode
Google Colab
Task-Based
More overhead
Cloud training jobs
Requires orchestration!
Local pc doesn't cut it
Privacy / Management concerns
Better hardware utilization
Unstable usage / demand
Challenge 4: Orchestration
Issues
Cloud Gang
Not only native tools!
All VMs in the end
Onprem Gang
ClearML Orchestrate: Queues and workers
Slurm: Queues and workers
Apache Airflow: Queues and workers
Metaflow: kubernetes
Kubeflow Pipelines: kubernetes
Kedro: backend agnostic!
Managing multiple users is hard
Multi-task scheduling is hard
GPU sharing and utilization is hard (thanks Nvidia)
Chain becomes complex, need pipelining
Challenge 5: Deployment
Issues
Optimize Model
ONNX
TensorRT
Tensorflow Lite
ML Kit
Model Serving
Nvidia Triton
Nvidia Triton
ClearML Serving*
BentoML
Seldon Core
Edge AI
~ Hardware
Tensorflow Lite
Production: don't half-ass this
How to make the model accessible?
Seamless model updates
Maximise hardware utilization
Challenge 6: Monitoring
Issues
DIY
Prometheus
Grafana
All-in-ones
Vertex/Sagemaker
ClearML
Comet
Datarobot
.............
Batteries Included
Data versioning system
Experiment manager
Orchestrator
Go to production and be ready to go back. Things will break.
Serving visibility (drift, latency etc.)
Traceability throughout the whole system
Alerting to be proactive on issues
A Final Argument for End-To-End
The click-though effect
Key Takeaways
Watch out for the marketing
Github Awesome lists are awesome
Don't prematurely optimize, solve problems as they present themselves
There's more than cloud native tools
Thank you!
https://app.clear.ml
Github: https://github.com/allegroai/clearml
Slack: clearml.slack.com
Twitter (goodest memes): @clearMLapp
Me: @VictorSonck
Try it yourself for free! It’s open-source!