Machine Learning in real life

From old buzzwords to new buzzwords

Nastasia Saby




#ML, #Craft, #Production



Blog Machine Learning in real life:

A lot of buzzwords that are only the tip of the iceberg.

Current buzzwords:

  • Big Data
  • Artificial intelligence
  • Deep learning
  • Artificial neural networks

Are they still valid?

Big Data

Big volumetry, diverse data

Machine learning can work with small data

Transfer learning reduces the need for big data

Then, is Big data still a valid buzzword?

It depends on the projects


Sometimes very important

Sometimes not so important

When it's important be prepared


Big data is hard

Artificial intelligence​

Replace a cognitive process by a program

2 kinds of AI: Strong and weak

Strong: when you're able to imitate totally a human


#Science fiction.

Weak or specialised: when you're able to imitate a defined cognitive process

Example of weak AI: predict breakdowns for buses

Then is Artificial intelligence​ still a valid buzzword?

Weak artificial intelligence YES


Strong artificial intelligence is overrated and more for literature and cinema.



Deep learning

When you have a model with different layers


3 hidden layers with an articificial neural network

Deep Learning is useful for unstructured data such as computer vision, NLP


Sometimes, Deep Learning is even useful for structured data such as tables

You don't always need it


Shallow learning is often enough

Then, is Deep Learning still a valid buzzword?

Yes, but for many projects, you don't need it


No need to be an expert in DL to work in data science



Artificial neural networks

A metaphor that refers to the way a brain works

Some neurons that are linked to transform inputs to outputs

Deep learning is not always a neural network


You can have different classifiers and do an aggregation

Then, is Artificial neural networks still a valid buzzword?

Yes, but in many companies, you don't need it


No need to be an expert to work in data science



Current buzzwords:

  • Big Data
  • Artificial intelligence
  • Deep learning
  • Artificial neural networks

Are they still valid?

They are still valid, but some of them are too overrated


They don't represent the reality of the majority of data science projects

When you start data science, you think you will do 20% of deep learning and 80% of shallow learning

Truth = 10% of machine learning, 90% of data cleaning, infrastructure, etc​

ML Code is small.

But ML Code has a big influence on all the process: monitoring, data collection, etc

Steps of a project:

1. Ingestion

2. Data cleaning

3. Feature engineering

4. Model

5. Validation

6. Deployment

7. Monitoring



Schema with different sources, different formats

Data cleaning


​Filter, Imputation

Feature engineering


Extract values from data

Join, enrich, computation



Choose the best one



Offline: Test and train datasets

Online: Real usage, AB Tests, Canary Testing

Which metric: accuracy, precision, a customised one?



Automatisation, API, Dashboard with Data Viz, included in another product



Classical monitoring + specific monitoring for deep learning

New buzzwords for me:


Feature engineering


Data Drift


The ability to understand the decisions of your model

If you want to differentiate dogs from wolves, be careful that the model is not learning according to the environment of the animal.

Grass = dog

Snow = wolf

Interpretability, what for?





Interpretability is sometimes not enough

Example of breakdowns of buses:

Investigate a crime that has not happened yet is hard


You need more than a prediction

In this case you need explainability:


the potential root cause of the breakdown for instance

Data Drift:


Data change all the time and can impact the performance of your model


Toilet paper

Sales before the lockdown

Sales during the lockdown


Toilet paper

Thank you

Any questions?

Nastasia Saby



ML in real life

By nastasiasaby

ML in real life

  • 445