EMPLOYEE RETENTION
Objective:

HR Team works on better anticipating unplanned leave or retirement in our Parisian workforce
Workflow
1. Data Cleaning

 

 

 

 

2. Exploratory Analysis

 

 

3. Modelling

 

4. Takeaways
1338 rows
35 columns
26 Numerical columns
9 Categorical columns
No NaN values
After cleaning, our dataset shows 214 resignations (19%) over these past 2 years
y = df.Attrition (0, 1) = Resigned or not
3 retention strategies
Data Cleaning
  • Drop repetitive columns
    
  • Winsorize 6 columns by visualising box plots
    
  • Create dummies for Gender, MaritalStatus & Department, drop=True
  • Numerize Attrition & OverTime ({'Yes':1,'No':0}) and BusinessTravel ({'Travel_Rarely':1,'Travel_Frequently':2, 'Non-Travel':0})
  • Create bins for Age ( 30>40>50>60 ), drop=True
==> total of 27 numerical columns 
YearsAtCompany Violinplot

Attrition 1 = Employee Left
Insight:
< 5 years of experience at Sanofi, more chance to leave
YearsWithCurrManager & YearsInCurrentRole
Violinplots

Attrition 1 = Employee Resigned
Insights:
After 2 YearsWithCurrManager, after 2 YearsInCurrentRole... more people staying
JobLevel & StockOptionLevel
Violinplots
Attrition 1 = Employee Left
JobLevel 1 = Entry-Level position

 

Insights:
The higher the JobLevel and Benefit Packages ... the less chance of leaving
BusinessTravel & OverTime
Violinplots
Attrition 1 = Employee Left
1 = Casual Businesstrip or overtime

 

Insights:
Travelling for work and doing overtime affects retention ... less resignations for 0
WorkLifeBalance & Age
Violinplots

Attrition 1 = Employee Left
WorkLifeBalance 4 = Employee highly values

 

 

Insights:
< 30 years old or Work-life balance highly valued... more risk to see the employee leave 

 

Other Insights
No impact

PercentSalaryHike; DistanceFromHome; HourlyRate; NumCompaniesWorked; YearsSinceLastPromotion; 
EnvironmentSatisfaction; 
JobSatisfaction;
People say what you wanna hear

EnvironmentSatisfaction & JobSatisfaction,
probably explained by a response bias
(internal survey)
Let's check these assumptions with our models
VIF Test performed
Remaining columns saved in X variables
Correlation Matrix for X
Input Data Selection
Input Data
MODELS
Priorities: 
1. Maximise AUC, the model's precision
2. Minimise False Negative rate (= predicted no resignation but the employee left)

# Model 1: Logistic Regression
# Model 2: Logistic Regression w/ normalizing HourlyRate
# Model 3 & 4: KNearest Neighbors
# Model 5: Naive Bayes Model
# Model 6 & 7: Decision Tree
# Model 8 & 9: Random Forest
# Model 10: Support Vector Machine
# Model 11: Nu SVC 
For every model
lst=[]
for i in range(1,13):
    FP = eval(f'conf{i}')[0][1]
    FN = eval(f'conf{i}')[1][0]
    TP = eval(f'conf{i}')[1][1]
    TN = eval(f'conf{i}')[0][0]
    
    FNR = FN/(TP+FN)*100
    FPR = FP/(FP+TN)*100
    ACC = (TP+TN)/(TP+FP+FN+TN)*100
    AUC = eval(f'model{i}_roc')
    
    lst.append([i,FNR,FPR,ACC,AUC])
    
results = pd.DataFrame(lst, columns=['Model','False_negative_rate',
		'False_Positive_rate','Overall_Accuracy','Area_Under_Curve'])

                     .

False Negative Rate         False Positive Rate

Area Under the Curve        Overall Accuracy
Model Evaluation
Priorities: 
1. Maximise AUC, the model's precision
2. Minimise False Negative rate (= predicted no resignation but the employee left)

                     .

Logistic Regression with PCA
Priorities: 
1. Maximise AUC, the model's precision
2. Minimise False Negative rate (= predicted no resignation but the employee left)

# Model 1: Logistic Regression
# Model 12: Logistic Regression with X_train_PCA & X_test_PCA



AUC = 0.54 & FNR = 54.6

 

We keep our X 
FINAL MODEL: Logistic Regression without PCA
AUC = 0.71 & FNR = 28

 

 

 

 

Expression: 
0.64*BusinessTravel + 0.033*DistanceFromHome - 0.046*Education - 0.39*EnvironmentSatisfaction + 0.003*HourlyRate - 0.4*JobLevel - 0.39*JobSatisfaction + 0.14*NumCompaniesWorked' + 1.4*OverTime - 0.69*StockOptionLevel - 0.07*TrainingTimesLastYear - 0.01*YearsInCurrentRole + 0.02*YearsSinceLastPromotion - - 0.11*YearsWithCurrManager + 0.34*Gender_Male + 0.23*MaritalStatus_Married + 0.44*MaritalStatus_Single + 0.69*Department_Sales - 0.29*Age_bins

 

 

 

 

 

Employee Retention Strategy
Key take aways: 

1. Major focus on Sales department (high turnover) and entry-level position

2. Minimise business trips and anticipate high-demand cycles to  foster a work environment pro work-life balance

3. Minimise overtime as much as possible and reward with stock options or training sessions

 

 

deck

By Alexis Lacabane