The universe of Data Science
Nicolas Rochet - 2025
A brief tour
Warm up activity
What words comes to your mind ?
Go here to participate:
www.wooclap.com/HEETHC
DATA
A computer science definition
DATA
INFORMATION
Sequence of symbols coded as numbers
interpreted data
A variety of data types
IMAGES
LANGUAGE
SERIES DE NOMBRES
SONS
Professional
photos
voices
recordings
reviews
commentaries
chats
weather
sensors
META DATA
web
applications
softwares
tweets
ventes
stock
business data
videos
Social networks
friends
sharing
likes
...
music
IoT
messages
electric consumtion
... from various sources
Open Data
Internal data
From the web
Data markets
scraping
API(s)
social network
plateforms
data scientist communities
public organisations
brokers
statistical reports
from a work domain
from softwares
from organisations
...
the rise of (BIG) DATA
the ability to produce, collect, store, structure, acces, and present digital data
BIG DATA
DATA SETS
SMALL DATA
Volume
Velocity
Variety
3V of Big Data
the quest for structuration
unstructured
unstructured
structured
low quantity
high quantity
big data
data lake
data warehouse
one data base
several types of data bases
single type data sets
data streams
different types of data sets
data mesh
A huge consumption of data

8
40
200
world data consumtion (Zb)
2015
2025
2020
with an big ecological footprint
Big data
pollution
Big data benefits
Big Data market size

The advantages of data for organizations

Exploiting data with data science
Communication
Data story telling
Dash boarding
Data Visualisation
Data Analysis
Computer science
Statistics
AI
Coding langage
Data
structures
Differents domains
Data Science
Reporting
Data mining
Business
Intelligence
Décision
APIs
Automation
tools
Decision Science
Game theory
Data mining & KDD
Le data mining is a step of a more general process: Knowledge Discovery in Databases (KDD)
Data Sources
Data mining
Knowledge extrcation
Exploitation & deployement
Structuration
Models
Patterns
Information
Enrichiment
Data bases
Entrepôt de donnée
Files
Documents
The rise of data science
and Machine Learning
For who ?
By whom ?
Some actors and users
Big companies
Research
Labs
Citizens
Institutions
Communities
CNIL
Europe
Governements
Companies
Academics
Open AI
UN
Kaggle
Start ups
Non profits
Github
INRIA
Element AI
GAFAM
BATX
Small companies
Data science employers


Data science employers
Data Science & organizations in 2021

Data Science & AI in 2021
Conclusions
45 % of organizations haven't adopted Data Science & IA
Organizations need to think right strategies to incorporate data culture
In general, Northern countries are more advanced
There is a discrepency in the advance of adoption among organizations
How to build and use data science ?
A data science project lifecycle
Need to ensure ethics-by-design !
idea
project's
definition
PoC
develop & deploy
sharing
&
feedbacks
ethical watch
mesure of project's adoption
Data science professions
Project's steps
Researcher
Data visualiser
Communicator
Data Analyst
Data Architecte
Data Manager
Data Scientist
Decision makers
Ethician
DPO
Manager
Data engineer
Designer
Characteristic steps
data collection & management
Data prepration
Data processing
Deployement
Data exploration
A need
A Problem to solve
Communication
Vizualisations
Report
Product & service
Decision making
cleaned
data
Use case
Data mining
Data collection
Data preparation
Data Analysis
Deployment
Besoin /
Problème à résoudre
Understanding of data
Modeling
Pattern
identification
Evaluation
Understanding of the domain
Inspired by CRISP's method
Exploratory Analysis
How to organize a data science project ?
Describe your project
Define what kind of data product you want
Build a detailled user story
Resume & pitch your project
How to organize a data science project ?
State of the art
Sourcing concerning your topic : web sites, articles, ...
Make an inventory of data sources
availables ones (open data, domain data, ...)
needed ones
Are there similar projects ?
How to organize a data science project ?
Design a mock-up
Imagine the user experience (UI & UX)
Organize visually what information & how to display ?
How to organize a data science project ?
Domain knowledge
Exploratory data analysis
Select your data's & analysis
what variables to keep, exclude ?
what kind of pre-processings to apply ?
what final processing analytics chain to apply ?
are there specific methods/algorithms to use ?
what type of plots to produce ?
How to organize a data science project ?
Organize your data & code
build a modularized template for your preprocessing chain
build a modularised template for your main processing chain
code the graphical interface
organize the deployement of your data product
structure your data with information systems tools
... and popular tools
RapidMiner
KNIME
R
Python
Jupyter notebooks
Tableau Software
Softwares & Platforms
Data Preparation
SAP
Microsoft Power BI
Qlik View
Hadoop
Google Cloud Platform
Amazon Web Services
Microsoft Azure
Scikit learn
Tensorflow
Pandas
ERP
Data management
Automation &
Deployement
SQL
No-SQL
Data warehouse
Data lake
Databases
Data structures
ETL
Exploration & communication
Processing
IBM Cloud
CRM
Sales force
SAS
API
...
LLM's based tools
Le boom des
Large Language Models
Le boom des IAs génératives
Un fort engouement de la recherche, du public et des entreprises


IAs génératives ?
Depuis quelques années les progrès de la recherche ont abouti à des IAs capables de générer des données réalistes
Des images
vidéos
dessins
illustrations
photo realistes
...
Du texte
Code
paragraphes
questions / réponses
Listes
résumés
...
Les Large Language Models (LLM)
Des sons
voix
musique
...
De très gros réseaux de neurones
Entrainés très longuement sur des jeux de données gigantesques ...
... à prédire chaque prochain token d'un texte
Le réseau apprend des représentations complexes (embeding)
Exemple simplifié de la génération de texte
Conception & Entraînement
"L'apprentissage automatique est une branche de l'IA
Bonnes capacités de généralisation




Le réseau entrainé a appris des représentations généralisables
Exemple simplifié de la génération de texte
Inférence
Génération de texte token par token
Ré-entrainement sur des données spécifiques
Sélectionner un ensemble de documents à donner en contexte


fine tuning
En tant qu'expert de la data science programmant en python ...
contexte : prompt
Retrieval Augmented Information
+
IAs génératives d'images
DALL-E

Stable diffusion
MidJourney
Groupe CompViz

Les plus connues
IAs génératives de vidéos
Sora

Veo
Meta Movie Gen
Meta AI
Les plus connues

IAs génératives de texte
Les modèles de fondation fermés (les plus connus)
IAs génératives de texte
Les modèles de fondation ouverts (les plus connus)

Falcon
Claude

Llama 3

Large Langage Model Meta AI
IA génératives de sons
Exemple : Stable audio

Générer du son à partir d'une instruction texte (prompt)
Des IAs aux capacités multi-tâches
...

Applications & uses cases
Succes stories & fails
Map of some applications
INDUSTRY
BANK
RETAIL
MARKETING
MEDECINE
ARTS
MEDIA
TRANSPORTS
...
Predictive maintenance
Robots
Flow management
Credit scoring
Fraud detecttion
Automatic trading
Sentiment analysis
Discovery of treatments
Prediction of treatment succes
Traffic analysis
Generative design
Sound generation
Image generation
Ressources planning
Assisted diagnostic
Autonmous vehicules
Automatic summary
Text generation
Product recommendation
Content recommendation
Automatic captioning
Churn prediction
customer behavior prediction
Uses cases
Well known succes stories
Recommendations
Online retail
Streaming plaforms
Social networks
Product / service
Content
People
Market places
Advertising
Uses cases
Marketing
chat bots
Churn prediction
sentiment analysis
Predictive marketing
Customer segmentation
A/B testing
optimization
Still growing
Uses cases
Widely used in the future ?
Drug discovery
Diagostic assistance
Health care
Stock optimisation
Patients allocation
Health medical record analysis
Smart city
Pollution prediction
Traffic prediction
Optmisation of
buildings consumption
Facial recognition
Uses cases
Should be avoided ?
"Smart" city
Predictive Police
Facial recognition
Prediction of recidivism
Justice
Prediction of criminality
Citizens surveillance

