From old buzzwords to new buzzwords
#ML, #Craft, #Production
Blog Machine Learning in real life: https://mlinreallife.github.io/
A lot of buzzwords that are only the tip of the iceberg.
Are they still valid?
Big volumetry, diverse data
Machine learning can work with small data
Transfer learning reduces the need for big data
Then, is Big data still a valid buzzword?
It depends on the projects
Sometimes very important
Sometimes not so important
When it's important be prepared
Big data is hard
Replace a cognitive process by a program
2 kinds of AI: Strong and weak
Strong: when you're able to imitate totally a human
Weak or specialised: when you're able to imitate a defined cognitive process
Example of weak AI: predict breakdowns for buses
Then is Artificial intelligence still a valid buzzword?
Weak artificial intelligence YES
Strong artificial intelligence is overrated and more for literature and cinema.
When you have a model with different layers
3 hidden layers with an articificial neural network
Deep Learning is useful for unstructured data such as computer vision, NLP
Sometimes, Deep Learning is even useful for structured data such as tables
You don't always need it
Shallow learning is often enough
Then, is Deep Learning still a valid buzzword?
Yes, but for many projects, you don't need it
No need to be an expert in DL to work in data science
Artificial neural networks
A metaphor that refers to the way a brain works
Some neurons that are linked to transform inputs to outputs
Deep learning is not always a neural network
You can have different classifiers and do an aggregation
Then, is Artificial neural networks still a valid buzzword?
Yes, but in many companies, you don't need it
No need to be an expert to work in data science
Are they still valid?
They are still valid, but some of them are too overrated
They don't represent the reality of the majority of data science projects
When you start data science, you think you will do 20% of deep learning and 80% of shallow learning
Truth = 10% of machine learning, 90% of data cleaning, infrastructure, etc
ML Code is small.
But ML Code has a big influence on all the process: monitoring, data collection, etc
Steps of a project:
2. Data cleaning
3. Feature engineering
Schema with different sources, different formats
Extract values from data
Join, enrich, computation
Choose the best one
Offline: Test and train datasets
Online: Real usage, AB Tests, Canary Testing
Which metric: accuracy, precision, a customised one?
Automatisation, API, Dashboard with Data Viz, included in another product
Classical monitoring + specific monitoring for deep learning
New buzzwords for me:
The ability to understand the decisions of your model
If you want to differentiate dogs from wolves, be careful that the model is not learning according to the environment of the animal.
Grass = dog
Snow = wolf
Interpretability, what for?
Interpretability is sometimes not enough
Example of breakdowns of buses:
Investigate a crime that has not happened yet is hard
You need more than a prediction
In this case you need explainability:
the potential root cause of the breakdown for instance
Data change all the time and can impact the performance of your model
Sales before the lockdown
Sales during the lockdown