Machine Learning
Dreams
- Servers (+GPU)
- Team
- Data Engineer(s)
- DevOps
- Assessors
- Senior DS
- Data Infrastructure
- Enough time
- $$$
ML from sh*t & sticks
- No hardware
- No money
- No data (almost)
- You are alone
- Open Source
- You know what you need to solve
ML from sh*t & stick
f : X → Y
Data
Model
Calculator
DATA
What's kind of data?
- text
- image
- audio
- video
- graph
- numberic/categorial
DATA
DATA
Open-Source
DATA
Labeling
- "Market" platform
- Self-hosted platform
DATA
Labeling: "Market" platform
DATA
Labeling: "Market" platform
DATA
Labeling: "Market" platform
- Fast
- Simple
- Cheap
- Low quality
- NDA issues
- Required "gold questions", "exams", etc
DATA
Labeling: Self-hosted platform
DATA
Labeling: Self-hosted platform
- SUPER FLEXIBLE
- Self-hosted
- Manual setup
- Required Assessors
DATA
Augmentation
DATA
Augmentation
Image / Video
- Crop
- Flip
- Rotate
- Color shift
- Blur
- Color filters
- etc
Text
- Synonyms 's/w1/w2'
- Back translation
- Drop/Insert random word
- Cut/Glue texts
Audio
- Noise
- Pitch
- Speed
DATA
Augmentation
Image / Video
Text / Audio
- /makcedward/nlpaug (?)
- ✋✋
MODEL
Data
Predictions
Model
MODEL
Data
BASELINE
Predictions
MODEL
Why so dummy model?
I want BOOSTING DEEP LEARNING!
- Fast (preparing & training)
- Good first approximation
- Less chance for overfitting
- Still works
MODEL
Embed your data
Confucius
MODEL
Data
Embeddings
Encoder
Your model
Predictions
MODEL
MODEL
Where can I find a "encoders"?
Here
MODEL
Pre-trained models
NLP
MODEL
Pre-trained models
CV
MODEL
MOAR MODELS
CALCULATOR
Laptop is OK,
but ....
CALCULATOR
CALCULATOR
CALCULATOR
Provider | GPU | GPU MEM (GB) | RAM (GB) | CPU (# cores) |
DISK (GB) |
---|---|---|---|---|---|
Floydhub | K80 V100 |
12-16 | 61 | - | 10 |
Paperspace | P5000 | 16 | 30 | 8 | 250 |
Google Collab | K80 TPU |
11.5 | 10-11 | 2 | 25 |
CALCULATOR
CONCLUSION
- Simple better than complex
- Fail fast
- LABEL YOUR DATA
- Re-use OSS data/models
- Take free stuff
CONCLUSION
Recipe
- Understand your task
- Collect data
- Label
- Augment
- Embed
- Create a simple model
- ...
- PROFIT!
Thanks!
ML из говна и палок
By Ivan Menshikh
ML из говна и палок
- 1,405