Machine Learning in real life

From old buzzwords to new buzzwords

Nastasia Saby

@saby_nastasia

Blog: https://mlinreallife.github.io/

Konecranes

#ML, #Craft, #Production

 

@saby_nastasia

Blog Machine Learning in real life: https://mlinreallife.github.io/

A lot of buzzwords that are only the tip of the iceberg.

Current buzzwords:

  • Big Data
  • Artificial intelligence
  • Deep learning
  • Artificial neural networks

Are they still valid?

Big Data

Big volumetry, diverse data

Machine learning can work with small data

Transfer learning reduces the need for big data

Then, is Big data still a valid buzzword?

It depends on the projects

 

Sometimes very important

Sometimes not so important

When it's important be prepared

 

Big data is hard

Artificial intelligence​

Replace a cognitive process by a program

2 kinds of AI: Strong and weak

Strong: when you're able to imitate totally a human

 

#Science fiction.

Weak or specialised: when you're able to imitate a defined cognitive process

Example of weak AI: predict breakdowns for buses

Then is Artificial intelligence​ still a valid buzzword?

Weak artificial intelligence YES

 

Strong artificial intelligence is overrated and more for literature and cinema.

 

#semi-overrated

Deep learning

When you have a model with different layers

Example

3 hidden layers with an articificial neural network

Deep Learning is useful for unstructured data such as computer vision, NLP

 

Sometimes, Deep Learning is even useful for structured data such as tables

You don't always need it

 

Shallow learning is often enough

Then, is Deep Learning still a valid buzzword?

Yes, but for many projects, you don't need it

 

No need to be an expert in DL to work in data science

 

#semi-overrated

Artificial neural networks

A metaphor that refers to the way a brain works

Some neurons that are linked to transform inputs to outputs

Deep learning is not always a neural network

 

You can have different classifiers and do an aggregation

Then, is Artificial neural networks still a valid buzzword?

Yes, but in many companies, you don't need it

 

No need to be an expert to work in data science

 

#semi-overrated

Current buzzwords:

  • Big Data
  • Artificial intelligence
  • Deep learning
  • Artificial neural networks

Are they still valid?

They are still valid, but some of them are too overrated

 

They don't represent the reality of the majority of data science projects

When you start data science, you think you will do 20% of deep learning and 80% of shallow learning

Truth = 10% of machine learning, 90% of data cleaning, infrastructure, etc​

ML Code is small.

But ML Code has a big influence on all the process: monitoring, data collection, etc

Steps of a project:

1. Ingestion

2. Data cleaning

3. Feature engineering

4. Model

5. Validation

6. Deployment

7. Monitoring

Ingestion

 

Schema with different sources, different formats

Data cleaning

 

​Filter, Imputation

Feature engineering

 

Extract values from data

Join, enrich, computation

Model

 

Choose the best one

Validation

 

Offline: Test and train datasets

Online: Real usage, AB Tests, Canary Testing

Which metric: accuracy, precision, a customised one?

Deployment

 

Automatisation, API, Dashboard with Data Viz, included in another product

Monitoring

 

Classical monitoring + specific monitoring for deep learning

New buzzwords for me:

 

Feature engineering

Intepretability

Data Drift

Interpretability

The ability to understand the decisions of your model

If you want to differentiate dogs from wolves, be careful that the model is not learning according to the environment of the animal.

Grass = dog

Snow = wolf

Interpretability, what for?

Ethics

Trust

Marketing

Debug

Interpretability is sometimes not enough

Example of breakdowns of buses:

Investigate a crime that has not happened yet is hard

 

You need more than a prediction

In this case you need explainability:

 

the potential root cause of the breakdown for instance

Data Drift:

 

Data change all the time and can impact the performance of your model

Cinema

Toilet paper

Sales before the lockdown

Sales during the lockdown

Cinema

Toilet paper

Thank you

Any questions?

Nastasia Saby

@saby_nastasia

Blog: https://mlinreallife.github.io/

ML in real life

By nastasiasaby

ML in real life

  • 487