Machine Learning

Dreams

  • Servers (+GPU)
  • Team
    • Data Engineer(s)
    • DevOps
    • Assessors
    • Senior DS
  • Data Infrastructure
  • Enough time
  • $$$

ML from sh*t & sticks

  • No hardware
  • No money
  • No data (almost)
  • You are alone
  • Open Source
  • You know what you need to solve

ML from sh*t & stick

f : X  Y

Data

Model

Calculator

DATA

What's kind of data?

  • text
  • image
  • audio
  • video
  • graph
  • numberic/categorial

DATA

DATA

Open-Source

DATA

Labeling

  • "Market" platform
  • Self-hosted platform

DATA

Labeling: "Market" platform

DATA

Labeling: "Market" platform

DATA

Labeling: "Market" platform

  • Fast
  • Simple
  • Cheap
  • Low quality
  • NDA issues
  • Required "gold questions", "exams", etc

DATA

Labeling: Self-hosted platform

DATA

Labeling: Self-hosted platform

  • SUPER FLEXIBLE
  • Self-hosted
  • Manual setup
  • Required Assessors

DATA

Augmentation

DATA

Augmentation

Image / Video

  • Crop
  • Flip
  • Rotate
  • Color shift
  • Blur
  • Color filters
  • etc

Text

  • Synonyms 's/w1/w2'
  • Back translation
  • Drop/Insert random word
  • Cut/Glue texts

Audio

  • Noise
  • Pitch
  • Speed

DATA

Augmentation

Image / Video

Text / Audio

MODEL

Data

Predictions

Model

MODEL

Data

BASELINE

Predictions

MODEL

 

Why so dummy model?

I want BOOSTING DEEP LEARNING!

  • Fast (preparing & training)
  • Good first approximation
  • Less chance for overfitting
  • Still works

MODEL

Embed your data

Confucius

MODEL

Data

Embeddings

Encoder

Your model

Predictions

MODEL

MODEL

Where can I find a "encoders"?

Here

MODEL

Pre-trained models

NLP

MODEL

Pre-trained models

CV

MODEL

MOAR MODELS

CALCULATOR

Laptop is OK,

but ....

CALCULATOR

CALCULATOR

CALCULATOR

Provider GPU GPU MEM (GB) RAM (GB) CPU
(# cores)
DISK (GB)
Floydhub K80
V100
12-16 61 - 10
Paperspace P5000 16 30 8 250
Google Collab K80
TPU
11.5 10-11 2 25

CALCULATOR

CONCLUSION

  • Simple better than complex
  • Fail fast
  • LABEL YOUR DATA
  • Re-use OSS data/models
  • Take free stuff

CONCLUSION

Recipe

  1. Understand your task
  2. Collect data
  3. Label
  4. Augment
  5. Embed
  6. Create a simple model
  7. ...
  8. PROFIT!

Thanks!

ML из говна и палок

By Ivan Menshikh

ML из говна и палок

  • 1,405