Катим ML в Production
Intro
Buisness PoV
ĦØƁΔЯ°
ĂĮ
ƘႸԒᕊƬჄ₽Δ
Intro
Dictionary
For customers | For real |
---|---|
Deep Learning | Logistic Regression |
Machine Learning | Logistic Regression |
NLP | Regular expressions |
Domain adaptation | Handcrafted hacks |
Magic | Matrix multiplication |
AI | Any random sh*t |
Intro
For real
import joblib
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
clf = LogisticRegression()
clf.fit(X, y)
print(clf.score(X, y))
joblib.dump(clf, "my_deeplearning_model")
Okay, what's next?
Requirements
- Data size (KB vs TB)?
- Model size?
- Batch/Online?
Okay, what's next?
Requirements
- Speed (RPS, Latency, SLA)
- Hardware (RAM, CPU, GPU)
- Performance (algorithm)
Okay, what's next?
Problem
PRODUCTION
And what to do?
Solution #1: CLI
$ python3 awesome_predict.py \
--model "my_deeplearning_model" \
--input "1.csv" \
--output "1.predictions.csv"
Solution #1: CLI
- Simple
- Batch
- Hard to use
- Not scalable
- Hard to integrate
- No "real" online
Hm, we need a service!
Services
- Docker ♥
- Cloud ♥
- Redis/RMQ/* as service
Solution #2: REST
GET /info
POST /predict
Solution #2: REST
Solution #2: REST
- Simple usage
- Scalable
- Simple enough to implement
- Useful for demo
- Easy to integrate
- Many tutorials
- Online & Batch
- Performance?
What if bigdata?
Solution #3: PySpark
Solution #3: PySpark
- Batch & ~Online
- Really TB of data
- JVM
- Overhead
- Impossible to apply the large model (>2GB)
- Random crashes
- High support price
- Complicated
Maybe
ВĘԒѺԸNΠΣΔ?
Solution #4: ВĘԒѺԸNПΣΔ
Queue
Storage
Machines
/a/1.csv
/a/2.csv
/a/1.csv
/a/2.csv
/a/1.out.csv
/a/2.out.csv
/a/1.csv
/a/2.csv
/a/1.out.csv
/a/2.out.csv
- No overhead
- Simple to control
- Batch
- Scalable
- Simple enough to implement
- no Online
- Machine managing fully on your side
Solution #4: ВĘԒѺԸNПΣΔ
Solution #5: SageMaker
Solution #5: SageMaker
- Batch & Online
- Different APIs for free (REST + SDK)
- Simple usage (if you setup all things properly)
- Price
- Complicated
- Too much marketing
Conclusion
Data size | Model size | Batch/Online | Pick |
---|---|---|---|
Small | Small | Batch | * (even CLI) |
Small | Small | Online | REST |
Small | Large | Batch | REST |
Small | Large | Online | REST |
Large | Small | Batch | |
Large | Small | Online |
REST / SageMaker / |
Large | Large | Batch | ВĘԒѺԸNПΣΔ / REST / SageMaker |
Large | Large | Online | REST / SageMaker |
0 < Small < 0.5TB
Large >= 0.5TB
0 < Small < 500MB
Large >= 500MB
Conclusion
-
REST is the most universal choice
-
В Ę Ԓ Ѻ Ը N П Σ Д isn't always a bad idea
-
DataSatanist should be able to solve such problems (or at least know about them)
Thanks!
Катим ML в Production
By Ivan Menshikh
Катим ML в Production
- 768