From old buzzwords to new buzzwords
Nastasia Saby
@saby_nastasia
Blog: https://mlinreallife.github.io/
Konecranes
#ML, #Craft, #Production
@saby_nastasia
Blog Machine Learning in real life: https://mlinreallife.github.io/
A lot of buzzwords that are only the tip of the iceberg.
Current buzzwords:
Are they still valid?
Big Data
Big volumetry, diverse data
Machine learning can work with small data
Transfer learning reduces the need for big data
Then, is Big data still a valid buzzword?
It depends on the projects
Sometimes very important
Sometimes not so important
When it's important be prepared
Big data is hard
Artificial intelligence
Replace a cognitive process by a program
2 kinds of AI: Strong and weak
Strong: when you're able to imitate totally a human
#Science fiction.
Weak or specialised: when you're able to imitate a defined cognitive process
Example of weak AI: predict breakdowns for buses
Then is Artificial intelligence still a valid buzzword?
Weak artificial intelligence YES
Strong artificial intelligence is overrated and more for literature and cinema.
#semi-overrated
Deep learning
When you have a model with different layers
Example
3 hidden layers with an articificial neural network
Deep Learning is useful for unstructured data such as computer vision, NLP
Sometimes, Deep Learning is even useful for structured data such as tables
You don't always need it
Shallow learning is often enough
Then, is Deep Learning still a valid buzzword?
Yes, but for many projects, you don't need it
No need to be an expert in DL to work in data science
#semi-overrated
Artificial neural networks
A metaphor that refers to the way a brain works
Some neurons that are linked to transform inputs to outputs
Deep learning is not always a neural network
You can have different classifiers and do an aggregation
Then, is Artificial neural networks still a valid buzzword?
Yes, but in many companies, you don't need it
No need to be an expert to work in data science
#semi-overrated
Current buzzwords:
Are they still valid?
They are still valid, but some of them are too overrated
They don't represent the reality of the majority of data science projects
When you start data science, you think you will do 20% of deep learning and 80% of shallow learning
Truth = 10% of machine learning, 90% of data cleaning, infrastructure, etc
ML Code is small.
But ML Code has a big influence on all the process: monitoring, data collection, etc
Steps of a project:
1. Ingestion
2. Data cleaning
3. Feature engineering
4. Model
5. Validation
6. Deployment
7. Monitoring
Ingestion
Schema with different sources, different formats
Data cleaning
Filter, Imputation
Feature engineering
Extract values from data
Join, enrich, computation
Model
Choose the best one
Validation
Offline: Test and train datasets
Online: Real usage, AB Tests, Canary Testing
Which metric: accuracy, precision, a customised one?
Deployment
Automatisation, API, Dashboard with Data Viz, included in another product
Monitoring
Classical monitoring + specific monitoring for deep learning
New buzzwords for me:
Feature engineering
Intepretability
Data Drift
Interpretability
The ability to understand the decisions of your model
If you want to differentiate dogs from wolves, be careful that the model is not learning according to the environment of the animal.
Grass = dog
Snow = wolf
Interpretability, what for?
Ethics
Trust
Marketing
Debug
Interpretability is sometimes not enough
Example of breakdowns of buses:
Investigate a crime that has not happened yet is hard
You need more than a prediction
In this case you need explainability:
the potential root cause of the breakdown for instance
Data Drift:
Data change all the time and can impact the performance of your model
Cinema
Toilet paper
Sales before the lockdown
Sales during the lockdown
Cinema
Toilet paper
Any questions?
Nastasia Saby
@saby_nastasia
Blog: https://mlinreallife.github.io/