Data Science & AI at Jusbrasil
The past, the present and the Future
Agenda
Context: Why Data Science and AI?
Agenda
What the leads team has been up to
Context: Why Data Science and AI?
Agenda
What the leads team has been up to
Context: Why Data Science and AI?
What's next
Why Data Science and AI?
Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics.
"Data Science enables the creation of data products.
Whether data is search terms, voice samples or product reviews, users are in a feedback loop in which they contribute to the products they use.
That's the beginning of Data Science"
- Mike Loukides, 2010
"Data Science enables the creation of data products.
Whether data is search terms, voice samples or product reviews, users are in a feedback loop in which they contribute to the products they use.
That's the beginning of Data Science"
- Mike Loukides, 2010
How Data Science is done
How Data Science is done
The world
Product
System
# of cases
# of users
whatnot
How Data Science is done
The world
Product
System
# of cases
# of users
whatnot
Ingest Raw Data
Transactions
Web Scraping
Mobile data
Sensor data
Social feed
How Data Science is done
The world
Product
System
# of cases
# of users
whatnot
Ingest Raw Data
Transactions
Web Scraping
Mobile data
Sensor data
Social feed
Crunch Data
MapReduce
ETL, ELT
Data Wrangle
Dim Reduction
Data Cleansing
How Data Science is done
The world
Product
System
# of cases
# of users
whatnot
Ingest Raw Data
Transactions
Web Scraping
Mobile data
Sensor data
Social feed
The Dataset
Independency?
Correlation?
Covariance?
Causality?
Dimensionality?
Crunch Data
MapReduce
ETL, ELT
Data Wrangle
Dim Reduction
Data Cleansing
How Data Science is done
The world
Product
System
# of cases
# of users
whatnot
Ingest Raw Data
Transactions
Web Scraping
Mobile data
Sensor data
Social feed
The Dataset
Independency?
Correlation?
Covariance?
Causality?
Dimensionality?
Crunch Data
MapReduce
ETL, ELT
Data Wrangle
Dim Reduction
Data Cleansing
Learn From Data
Inference
Data & Algorithm Models
Machine Learning
Regression & Prediction
Classification & Clustering
How Data Science is done
The world
Product
System
# of cases
# of users
whatnot
Ingest Raw Data
Transactions
Web Scraping
Mobile data
Sensor data
Social feed
The Dataset
Independency?
Correlation?
Covariance?
Causality?
Dimensionality?
Crunch Data
MapReduce
ETL, ELT
Data Wrangle
Dim Reduction
Data Cleansing
Learn From Data
Inference
Data & Algorithm Models
Machine Learning
Regression & Prediction
Classification & Clustering
Deliver and Visualize insight
Actionable
Predictive
Business Value
Easy to explain
Answer and new questions
Data Science: Explain like I'm 5
A fuckton of data
Mathemagics
Data Science: Explain like I'm 5
Insights that you couldn't have imagined
... And predictions, tons of predictions.
What the leads team has been up to
(besides surviving vicente's philosophical ideas every 2 hours)
What the leads team has been up to
Vicente (for real)
Legal Issue Classifier
Classificação de casos automática com alta precisão de acerto
Legal Issue Classifier
Classificação de casos automática com alta precisão de acerto
Remoção de um campo do form. reduzindo fricção e gerando cerca de +3500 casos a mais por mês
Legal Issue Classifier
Churn study
Question: What's the pattern among subscribed users? Why they churn?
Question: Is there any interesting correlation between our current features related to user-behavior?
Features
Apresentação (bio)
NotificarLeadEmail
NotificarLeadEmail
NotificarLeadSite
ReceberDigest
TotalMeusDocumentos
DuvidasLidas
RespostasAvaliadasCasos
TotalNotificacoesNaoLidas
VisivelLista
VisualizacoesTelefone
TotalDuvidasRecebidas
RespostasLidasCasos
RespostasCasos
Features
Is there any correlation?
Features
Features
What are the most important features that draw a line between churned and non-churned users?
[0] apresentacao
[1] notificarLeadEmail
[2] notificarLeadSite
[3] receberDigest
[4] totalMeusDocumentos
[5] duvidasLidas
[6] respostasAvaliadasCasos
[7] respostasCasos
[8] respostasLidasCasos
[9] totalDuvidasRecebidas
[10] totalNotificacoesNaoLidas
[11] visivelLista
[12] visualizacoesTelefone
Why and how?
Preliminar conclusions
In plain words
Advogados que recebem notificação por email tendem a manter a assinatura
#1
Advogados que recebem notificação por email tendem a manter a assinatura
#1
Dos usuários que cancelaram, 72% optaram por não receber notificação por email, enquanto apenas 16% dos usuários ativos optaram por não receber notificação por email
Há uma correlação positiva entre a quantidade de casos lidos por um advogado e a sua permanência como assinante
#2
Usuários que optam por receber digest tendem a não cancelar
#3
Usuários que optam por receber digest tendem a não cancelar
#3
35% dos usuários que cancelaram não recebem digest, enquanto apenas 10% dos usuários ativos optam por não receber digest
Churn correlaciona negativamente com todas as features selecionadas
#4
Churn correlaciona negativamente com todas as features selecionadas
#4
Todas são relacionadas a engajamento do usuário
Logo, quanto menor o engajamento do assinante, maior a chance dele cancelar
#4
Quanto maior o tempo como assinante, menor a chance de cancelamento
#5
Quanto maior o tempo como assinante, menor a chance de cancelamento
#5
Churn rate de assinantes com menos de 3 meses de assinatura: 46%
Quanto maior o tempo como assinante, menor a chance de cancelamento
#5
Churn rate de assinantes com menos de 3 meses de assinatura: 46%
Churn rate de assinantes com tempo de assinatura entre 3 meses e 6 meses: 42%
Quanto maior o tempo como assinante, menor a chance de cancelamento
#5
Churn rate de assinantes com menos de 3 meses de assinatura: 46%
Churn rate de assinantes com tempo de assinatura entre 3 meses e 6 meses: 42%
Churn rate de assinantes com tempo de assinatura entre 6 meses e 1 ano: 34%
Notificação de leads por email, respostasAvalidasCasos e respostasCasos tem maior impacto na distinção entre usuários que cancelam e não cancelam.
#6
What's next
Hot Or Cold Predictor
Hot Or Cold Predictor
Given a legal case, is it a hot case or not?
Churn prediction model
Churn prediction model
Given a subscribed user, what's the likelihood that he/she cancels?
Churn prediction model
Given a subscribed user, what's the likelihood that he/she cancels?
What can we do to prevent it?
WTF can we do with all these new information and insights?
This is up to all of us.
Thanks!
Data Scie
By Rodrigo Araújo
Data Scie
- 1,896