EMPLOYEE RETENTION
Objective:
HR Team works on better anticipating unplanned leave or retirement in our Parisian workforce
Workflow
1. Data Cleaning
2. Exploratory Analysis
3. Classifiers
4. Takeaways
1338 rows
35 columns
26 Numerical columns
9 Categorical columns
No NaN values
After cleaning, our dataset shows 214 resignations (19%) over these past 2 years
y = df.Attrition (0, 1) = Resigned or not
3 strategies for employee retention
Data Cleaning
-
Drop repetitive columns
-
Winsorize 6 columns by visualising box plots
-
Create dummies for Gender, MaritalStatus & Department, drop=True
-
Numerize Attrition & OverTime ({'Yes':1,'No':0}) and BusinessTravel ({'Travel_Rarely':1,'Travel_Frequently':2, 'Non-Travel':0})
-
Create bins for Age ( 30>40>50>60 ), drop=True
YearsAtCompany Violinplot
Attrition 1 = Employee Left
< 5 years of experience ... more chance to leave

YearsWithCurrManager & YearsInCurrentRole
Violinplots
Attrition 1 = Employee Resigned
After 2 YearsWithCurrManager, after 2 YearsInCurrentRole ... more people staying


JobLevel & StockOptionLevel
Violinplots
The higher the job level position and benefit packages ... the less chance of leaving


.
Highest Proportion of resignation (36%)
Highest Proportion of resignation (32%)
.
BusinessTravel & OverTime
Violinplots
Travelling for work and doing overtime affects retention ... less resignations for 0


.
Highest Proportion of resignation (32%)
.
Highest Proportion of resignation (42%)
WorkLifeBalance & Age
Violinplots
< 30 years old or work-life balance highly valued (1=top priority) by the employee... more risk to see the employee leave


Highest Proportion of resignation (38%)
.
Highest Proportion of resignation (49%)
.
Other Insights
No impact
- EnvironmentSatisfaction;
- JobSatisfaction;
- PercentSalaryHike;
- DistanceFromHome;
- HourlyRate;
- NumCompaniesWorked; YearsSinceLastPromotion;
People say what you want to hear
- EnvironmentSatisfaction & JobSatisfaction have no impact ?
- Probably explained by a response bias (internal survey)
Confront these assumptions to our models
VIF Test performed

Remaining columns saved in X variables
Correlation Matrix for X



Input Data Selection

Input Data
Total of 27 numerical columns
Classifiers
Priorities:
1. Maximise AUC, the model's precision
2. Minimise False Negative rate (= predicted no resignation but the employee left)
- Model 1: Logistic Regression
- Model 2: Logistic Regression w/ normalizing HourlyRate
- Model 3 & 4: KNearest Neighbors
- Model 5: Naive Bayes Model
- Model 6 & 7: Decision Tree
- Model 8 & 9: Random Forest
- Model 10: Support Vector Machine
- Model 11: Nu SVC
lst=[]
for i in range(1,13):
FP = eval(f'conf{i}')[0][1]
FN = eval(f'conf{i}')[1][0]
TP = eval(f'conf{i}')[1][1]
TN = eval(f'conf{i}')[0][0]
FNR = FN/(TP+FN)*100
FPR = FP/(FP+TN)*100
ACC = (TP+TN)/(TP+FP+FN+TN)*100
AUC = eval(f'model{i}_roc')
lst.append([i,FNR,FPR,ACC,AUC])
results = pd.DataFrame(lst, columns=['Model','False_negative_rate',
'False_Positive_rate','Overall_Accuracy','Area_Under_Curve']).
False Positive Rate
Minimize
For every model
Area Under the Curve
False Negative Rate
Overall Accuracy
Model Evaluation
Priorities:
1. Maximise AUC, the model's precision
2. Minimise False Negative rate (= predicted no resignation but the employee left)

.
Best Model
Logistic Regression with versus without PCA
Model 1: Logistic Regression
AUC =0.71 & FNR = 28
Model 12: Logistic Regression with X_train_PCA & X_test_PCA
AUC =0.54 & FNR = 54.6
Priorities:
1. Maximise AUC, the model's precision
2. Minimise False Negative rate
.
Best Model
Final Model - Logistic Regression without PCA
Expression of Attrition =
Function = 0.64*BusinessTravel + 0.033*DistanceFromHome - 0.046*Education - 0.39*EnvironmentSatisfaction + 0.003*HourlyRate - 0.4*JobLevel - 0.39*JobSatisfaction + 0.14*NumCompaniesWorked
+ 1.4*OverTime - 0.69*StockOptionLevel - 0.07*TrainingTimesLastYear - 0.01*YearsInCurrentRole + 0.02*YearsSinceLastPromotion - - 0.11*YearsWithCurrManager + 0.34*Gender_Male + 0.23*MaritalStatus_Married + 0.44*MaritalStatus_Single + 0.69*Department_Sales - 0.29*Age_bins

Key Takeaways
3 Strategies for Employee Retention
Major focus on Sales department (high turnover) and entry-level position
Minimise business trips and anticipate high-demand cycles to foster a work environment pro work-life balance
Minimise overtime as much as possible and reward with stock options or training sessions
Employee_Retention
By Alexis Lacabane
Employee_Retention
- 33