Competitive Machine Learning

Vladimir Iglovikov

Sr. Data Scientist at TrueAccord

PhD in Physics at UC Davis

Kaggle Master

Data Analyst

$73,700

 

  • R
  • SQL
  • Tableau
  • ...

Data Enginer

$101,524

 

  • Scala
  • Hadoop
  • Spark
  • Databases
  • Cloud
  • ...

ML Engineer

Data Science jobs

$120,960

 

  • Python
  • Machine Learning
  • Deep Learning
  • NLP
  • Computer Vision
  • ...

Mean base pay in San Francisco at Glassdoor.com

Q: What is business looking for?

A: Can you get job done?

Job interview process

Stages:

  1. Recruiter (Resume)
  2. Tech screen (Skill)
  3. Onsite inteview (Skill)

Resume

  1. Education
  2. Years of relevant industry experience
  3. Relevant projects

Skill

  1. Theory
  2. Ability to write code
  3. Communication skills
  4. Cultural fit

Resume

  1. Education
  2. Years of relevant industry experience
  3. Relevant projects

Skill

  1. Theory
  2. Ability to write code
  3. Communication skills
  4. Cultural fit

Competitions may give

  1. Line in Resume / LinkedIn
  2. ML / DL Theory
  3. Coding
  4. Libraries / Frameworks
  5. Networking

ImageNet

Pros:

  • Great problems
  • Well recognized

Kaggle.com

Pros:

  • Great problems
  • User-friendly interface
  • Monetary prizes
  • All level problems
  • Large community

Cons:

  • Once a year
  • No monetary prizes
  • User-unfriendly interface
  • No entry level problems

Cons:

  • not well recognized

Research

Click through rate type competitions (Ads)

  1. Large sparse datasets
  2. A lot of Data Engineering
  3. Latency is crucial

 

  • Logistic regression
  • Factorization machines
  • Vowpal Wabbit

Which ads show to the user?

Data:

  • User behavior features
  • Ad features

Mixed type data competitions

Should we approve a person for a credit card?

What country Airbnb customer is planning to visit next? 

  • Dense Data
  • A lot of feature engineering
  • Decision tree based algorithms

Natural Language Processing

What results should search engine return for a query?

Sentiment analysis.

  • Bag of Words
  • word2vec
  • Recurrent Neural Networks

Computer Vision

Input

Output

How to start?

Titanic: Machine Learning from Disaster

 

  1. Small data
  2. Decision Tree
  3. Tutorials

Digit recognizer

  1. Small data
  2. CNN
  3. Tutorials

What am I not trying to say?

Q: Will participation in competitions give me a lot?

A: No. Performing well on the regular basis - yes.

 

Q: Are competitions mandatory to get a job?

A: No.

 

Q: Will good performance guarantee a job offer?

A: No. But it may significantly improve your chances.

  • Improves resume
  • Coding skills
  • Theoretical knowledge
  • Libraries / Frameworks
  • Networking

Summary

  • Product feeling
  • Communication skills
  • Teamwork
  • Leadership skills

deck

By Vladimir Iglovikov