EMPLOYEE RETENTION
Objective: HR Team works on better anticipating unplanned leave or retirement in our Parisian workforce
Workflow
1. Data Cleaning
2. Exploratory Analysis
3. Modelling
4. Takeaways
1338 rows
35 columns
26 Numerical columns
9 Categorical columns
No NaN values
After cleaning, our dataset shows 214 resignations (19%) over these past 2 years
y = df.Attrition (0, 1) = Resigned or not
3 retention strategies
Data Cleaning
Drop repetitive columns
Winsorize 6 columns by visualising box plots
Create dummies for Gender, MaritalStatus & Department, drop=True
Numerize Attrition & OverTime ({'Yes':1,'No':0}) and BusinessTravel ({'Travel_Rarely':1,'Travel_Frequently':2, 'Non-Travel':0})
Create bins for Age ( 30>40>50>60 ), drop=True
==> total of 27 numerical columns
YearsAtCompany Violinplot
Attrition 1 = Employee Left
Insight:
< 5 years of experience at Sanofi, more chance to leave
YearsWithCurrManager & YearsInCurrentRole
Violinplots
Attrition 1 = Employee Resigned
Insights:
After 2 YearsWithCurrManager, after 2 YearsInCurrentRole... more people staying
JobLevel & StockOptionLevel
Violinplots
Attrition 1 = Employee Left
JobLevel 1 = Entry-Level position
Insights:
The higher the JobLevel and Benefit Packages ... the less chance of leaving
BusinessTravel & OverTime
Violinplots
Attrition 1 = Employee Left
1 = Casual Businesstrip or overtime
Insights: Travelling for work and doing overtime affects retention ... less resignations for 0
WorkLifeBalance & Age
Violinplots
Attrition 1 = Employee Left
WorkLifeBalance 4 = Employee highly values
Insights:
< 30 years old or Work-life balance highly valued... more risk to see the employee leave
Other Insights
No impact PercentSalaryHike; DistanceFromHome; HourlyRate; NumCompaniesWorked; YearsSinceLastPromotion;
EnvironmentSatisfaction; JobSatisfaction;
People say what you wanna hear
EnvironmentSatisfaction & JobSatisfaction,
probably explained by a response bias
(internal survey)
Let's check these assumptions with our models
VIF Test performed
Remaining columns saved in X variables
Correlation Matrix for X
Input Data Selection
Input Data
MODELS
Priorities: 1. Maximise AUC, the model's precision 2. Minimise False Negative rate (= predicted no resignation but the employee left) # Model 1: Logistic Regression # Model 2: Logistic Regression w/ normalizing HourlyRate # Model 3 & 4: KNearest Neighbors # Model 5: Naive Bayes Model # Model 6 & 7: Decision Tree # Model 8 & 9: Random Forest # Model 10: Support Vector Machine # Model 11: Nu SVC
For every model
lst=[]
for i in range(1,13):
FP = eval(f'conf{i}')[0][1]
FN = eval(f'conf{i}')[1][0]
TP = eval(f'conf{i}')[1][1]
TN = eval(f'conf{i}')[0][0]
FNR = FN/(TP+FN)*100
FPR = FP/(FP+TN)*100
ACC = (TP+TN)/(TP+FP+FN+TN)*100
AUC = eval(f'model{i}_roc')
lst.append([i,FNR,FPR,ACC,AUC])
results = pd.DataFrame(lst, columns=['Model','False_negative_rate',
'False_Positive_rate','Overall_Accuracy','Area_Under_Curve']).
False Negative Rate False Positive Rate
Area Under the Curve Overall Accuracy
Model Evaluation
Priorities:
1. Maximise AUC, the model's precision
2. Minimise False Negative rate (= predicted no resignation but the employee left)
.
Logistic Regression with PCA
Priorities: 1. Maximise AUC, the model's precision 2. Minimise False Negative rate (= predicted no resignation but the employee left) # Model 1: Logistic Regression # Model 12: Logistic Regression with X_train_PCA & X_test_PCA AUC = 0.54 & FNR = 54.6
We keep our X
FINAL MODEL: Logistic Regression without PCA
AUC = 0.71 & FNR = 28
Expression:
0.64*BusinessTravel + 0.033*DistanceFromHome - 0.046*Education - 0.39*EnvironmentSatisfaction + 0.003*HourlyRate - 0.4*JobLevel - 0.39*JobSatisfaction + 0.14*NumCompaniesWorked' + 1.4*OverTime - 0.69*StockOptionLevel - 0.07*TrainingTimesLastYear - 0.01*YearsInCurrentRole + 0.02*YearsSinceLastPromotion - - 0.11*YearsWithCurrManager + 0.34*Gender_Male + 0.23*MaritalStatus_Married + 0.44*MaritalStatus_Single + 0.69*Department_Sales - 0.29*Age_bins
Employee Retention Strategy
Key take aways: 1. Major focus on Sales department (high turnover) and entry-level position 2. Minimise business trips and anticipate high-demand cycles to foster a work environment pro work-life balance 3. Minimise overtime as much as possible and reward with stock options or training sessions