IMAGES
LANGUAGE
SERIES DE NOMBRES
SONS
Professional
photos
voices
recordings
reviews
commentaries
chats
weather
sensors
META DATA
web
applications
softwares
tweets
ventes
stock
business data
videos
Social networks
friends
sharing
likes
...
music
IoT
messages
electric consumtion
scraping
API(s)
social network
plateforms
data scientist communities
public organisations
brokers
statistical reports
from a work domain
from softwares
from organisations
the ability to produce, collect, store, structure, acces, and present digital data
BIG DATA
DATA SETS
SMALL DATA
unstructured
unstructured
structured
low quantity
high quantity
big data
data lake
data warehouse
one data base
several types of data bases
single type data sets
data streams
different types of data sets
data mesh
8
40
200
world data consumtion (Zb)
2015
2025
2020
Communication
Data story telling
Dash boarding
Data Visualisation
Data Analysis
Computer science
Statistics
AI
Coding langage
Data
structures
Data Science
Reporting
Data mining
Business
Intelligence
Décision
APIs
Automation
tools
Decision Science
Game theory
Le data mining is a step of a more general process: Knowledge Discovery in Databases (KDD)
Data Sources
Data mining
Knowledge extrcation
Exploitation & deployement
Structuration
Models
Patterns
Information
Enrichiment
Data bases
Entrepôt de donnée
Files
Documents
Big companies
Research
Labs
Citizens
Institutions
Communities
CNIL
Europe
Governements
Companies
Academics
Open AI
UN
Kaggle
Start ups
Non profits
Github
INRIA
Element AI
GAFAM
BATX
Small companies
Need to ensure ethics-by-design !
idea
project's
definition
PoC
develop & deploy
sharing
&
feedbacks
ethical watch
mesure of project's adoption
Project's steps
Researcher
Data visualiser
Communicator
Data Analyst
Data Architecte
Data Manager
Data Scientist
Decision makers
Ethician
DPO
Manager
Data engineer
Designer
data collection & management
Data prepration
Data processing
Deployement
Data exploration
A need
A Problem to solve
Communication
Vizualisations
Report
Product & service
Decision making
cleaned
data
Use case
Data collection
Data preparation
Data Analysis
Deployment
Besoin /
Problème à résoudre
Understanding of data
Modeling
Pattern
identification
Evaluation
Understanding of the domain
Inspired by CRISP's method
Exploratory Analysis
Describe your project
Define what kind of data product you want
Build a detailled user story
Resume & pitch your project
State of the art
Sourcing concerning your topic : web sites, articles, ...
Make an inventory of data sources
availables ones (open data, domain data, ...)
needed ones
Are there similar projects ?
Design a mock-up
Imagine the user experience (UI & UX)
Organize visually what information & how to display ?
Domain knowledge
Exploratory data analysis
Select your data's & analysis
what variables to keep, exclude ?
what kind of pre-processings to apply ?
what final processing analytics chain to apply ?
are there specific methods/algorithms to use ?
what type of plots to produce ?
Organize your data & code
build a modularized template for your preprocessing chain
build a modularised template for your main processing chain
code the graphical interface
organize the deployement of your data product
structure your data with information systems tools
RapidMiner
KNIME
R
Python
Jupyter notebooks
Tableau Software
Softwares & Platforms
Data Preparation
SAP
Microsoft Power BI
Qlik View
Hadoop
Google Cloud Platform
Amazon Web Services
Microsoft Azure
Scikit learn
Tensorflow
Pandas
ERP
Data management
Automation &
Deployement
SQL
No-SQL
Data warehouse
Data lake
Databases
Data structures
ETL
Exploration & communication
Processing
IBM Cloud
CRM
Sales force
SAS
API
...
LLM's based tools
Depuis quelques années les progrès de la recherche ont abouti à des IAs capables de générer des données réalistes
vidéos
dessins
illustrations
photo realistes
...
Code
paragraphes
questions / réponses
Listes
résumés
...
Les Large Language Models (LLM)
voix
musique
...
De très gros réseaux de neurones
Entrainés très longuement sur des jeux de données gigantesques ...
... à prédire chaque prochain token d'un texte
Le réseau apprend des représentations complexes (embeding)
Conception & Entraînement
"L'apprentissage automatique est une branche de l'IA
Bonnes capacités de généralisation
Le réseau entrainé a appris des représentations généralisables
Inférence
Génération de texte token par token
Ré-entrainement sur des données spécifiques
Sélectionner un ensemble de documents à donner en contexte
fine tuning
En tant qu'expert de la data science programmant en python ...
contexte : prompt
Retrieval Augmented Information
Groupe CompViz
Les plus connues
Meta AI
Les plus connues
Les modèles de fondation fermés (les plus connus)
Les modèles de fondation ouverts (les plus connus)
Large Langage Model Meta AI
Générer du son à partir d'une instruction texte (prompt)
...
Succes stories & fails
INDUSTRY
BANK
RETAIL
MARKETING
MEDECINE
ARTS
MEDIA
TRANSPORTS
Predictive maintenance
Robots
Flow management
Credit scoring
Fraud detecttion
Automatic trading
Sentiment analysis
Discovery of treatments
Prediction of treatment succes
Traffic analysis
Generative design
Sound generation
Image generation
Ressources planning
Assisted diagnostic
Autonmous vehicules
Automatic summary
Text generation
Product recommendation
Content recommendation
Automatic captioning
Churn prediction
customer behavior prediction
Well known succes stories
Online retail
Streaming plaforms
Social networks
Market places
Advertising
chat bots
Churn prediction
sentiment analysis
Predictive marketing
Customer segmentation
A/B testing
optimization
Still growing
Widely used in the future ?
Drug discovery
Diagostic assistance
Stock optimisation
Patients allocation
Health medical record analysis
Pollution prediction
Traffic prediction
Optmisation of
buildings consumption
Facial recognition
Should be avoided ?
Predictive Police
Facial recognition
Prediction of recidivism
Prediction of criminality
Citizens surveillance