March 14, 2016
The Santander Group is the largest bank in the Eurozone with a market capitalization of €65,792M [4Q’15].
[1]
[1]
[1] Quarterly Shareholder Report October - December 2015
Worldwide presence:
Data
Process
Output
Based on historical data, we want to identify inactive customers (99K)
Data Mining
Model Evaluation
Visualization
Performs at least 3 transaction with the account in the last 90 day
Have an average volume in the last 6 months >= pre-determine amount
Data
Process
Feature Selection
163 variables
Model
Model Selection
TP – Predicted as inactive and are truly inactive
TN – Predicted as active and are truly active
FP – Predicted as inactive and are truly active
FN – Predicted as active and are truly inactive
Precision rate - TP / (TP + FP)
Recall rate - TP / (TP + FN)
Model
Two models Decision Tree and Naïve Bayes are selected based on the class recall and class precision.
Model Selection
Decision tree is used for scoring Dec 2014 active customers (N0)
Output
Cost (Acquiring new customers) > Cost (retaining a customer)
Can we use transactional data to predict the level of satisfaction of a customer?
Dependent variable:
1. Nivel Satisfaccion ~ nominal variable with values 0,1,2.
2. Predict_binary ~ binary variable with values 0 (for 0) and 1 (for 1 and 2).
Predict the customer satisfaction level
0's - 133 ~ 10.6%
1's - 419 ~ 33.4%
2's - 703 ~ 56%
Process
Feature Selection
200+ variables
Model
Cost sensitive-Random forest is selected based on the class recall and class precision.
Model Selection
W-simplekmeans
Cost sensitive-Random forest
K-star
Multilayer perception
Neural net
Naïve Bayes
Decision Tree
Cost sensitive-Random forest is used for scoring satisfaction for 1Q15
Output