EMPLOYEE RETENTION

 

Objective:

 

HR Team works on better anticipating unplanned leave or retirement in our Parisian workforce

Workflow

1. Data Cleaning

 

 

 

 

2. Exploratory Analysis

 

 

3. Classifiers

 

4. Takeaways

1338 rows
35 columns
26 Numerical columns
9 Categorical columns
No NaN values

After cleaning, our dataset shows 214 resignations (19%) over these past 2 years

y = df.Attrition (0, 1) = Resigned or not

3 strategies for employee retention

Data Cleaning

  • Drop repetitive columns

  • Winsorize 6 columns by visualising box plots

  • Create dummies for Gender, MaritalStatus & Department, drop=True

  • Numerize Attrition & OverTime ({'Yes':1,'No':0}) and BusinessTravel ({'Travel_Rarely':1,'Travel_Frequently':2, 'Non-Travel':0})

  • Create bins for Age ( 30>40>50>60 ), drop=True

YearsAtCompany Violinplot​

Attrition 1 = Employee Left

< 5 years of experience ... more chance to leave

YearsWithCurrManager & YearsInCurrentRole

Violinplots

Attrition 1 = Employee Resigned

After 2 YearsWithCurrManager, after 2 YearsInCurrentRole ... more people staying

JobLevel & StockOptionLevel

Violinplots

The higher the job level position and benefit packages ... the less chance of leaving

                     .

 

Highest Proportion of resignation (36%)

Highest Proportion of resignation (32%)

                     .

 

BusinessTravel & OverTime

Violinplots

Travelling for work and doing overtime affects retention ... less resignations for 0

                    .

Highest Proportion of resignation (32%)

                    .

Highest Proportion of resignation (42%)

WorkLifeBalance & Age

Violinplots

< 30 years old or work-life balance highly valued (1=top priority) by the employee... more risk to see the employee leave

Highest Proportion of resignation (38%)

.

Highest Proportion of resignation (49%)

.

 

Other Insights

No impact

 

  • EnvironmentSatisfaction;
  • JobSatisfaction;
  • PercentSalaryHike;
  • DistanceFromHome;
  • HourlyRate;
  • NumCompaniesWorked; YearsSinceLastPromotion;

People say what you want to hear

 

  • EnvironmentSatisfaction & JobSatisfaction have no impact ?
  • Probably explained by a response bias (internal survey)

Confront these assumptions to our models

VIF Test performed

Remaining columns saved in X variables

Correlation Matrix for X

Input Data Selection

Input Data

Total of 27 numerical columns

Classifiers

Priorities:
1. Maximise AUC, the model's precision
2. Minimise False Negative rate (= predicted no resignation but the employee left)

 

  • Model 1: Logistic Regression
  • Model 2: Logistic Regression w/ normalizing HourlyRate
  • Model 3 & 4: KNearest Neighbors
  • Model 5: Naive Bayes Model
  • Model 6 & 7: Decision Tree
  • Model 8 & 9: Random Forest
  • Model 10: Support Vector Machine
  • Model 11: Nu SVC
lst=[]
for i in range(1,13):
    FP = eval(f'conf{i}')[0][1]
    FN = eval(f'conf{i}')[1][0]
    TP = eval(f'conf{i}')[1][1]
    TN = eval(f'conf{i}')[0][0]
    
    FNR = FN/(TP+FN)*100
    FPR = FP/(FP+TN)*100
    ACC = (TP+TN)/(TP+FP+FN+TN)*100
    AUC = eval(f'model{i}_roc')
    
    lst.append([i,FNR,FPR,ACC,AUC])
    
results = pd.DataFrame(lst, columns=['Model','False_negative_rate',
		'False_Positive_rate','Overall_Accuracy','Area_Under_Curve'])

                     .

False Positive Rate

Minimize

For every model

Area Under the Curve

False Negative Rate

Overall Accuracy

Model Evaluation

Priorities:
1. Maximise AUC, the model's precision
2. Minimise False Negative rate (= predicted no resignation but the employee left)

                     .

Best Model

Logistic Regression with versus without PCA

Model 1: Logistic Regression

              AUC =0.71 & FNR = 28

Model 12: Logistic Regression with X_train_PCA & X_test_PCA

              AUC =0.54 & FNR = 54.6

Priorities:
1. Maximise AUC, the model's precision
2. Minimise False Negative rate

                     .

Best Model

Final Model - Logistic Regression without PCA

 

Expression of Attrition =

 

Function = 0.64*BusinessTravel + 0.033*DistanceFromHome - 0.046*Education - 0.39*EnvironmentSatisfaction + 0.003*HourlyRate - 0.4*JobLevel - 0.39*JobSatisfaction + 0.14*NumCompaniesWorked 

+ 1.4*OverTime - 0.69*StockOptionLevel - 0.07*TrainingTimesLastYear - 0.01*YearsInCurrentRole + 0.02*YearsSinceLastPromotion - - 0.11*YearsWithCurrManager + 0.34*Gender_Male + 0.23*MaritalStatus_Married + 0.44*MaritalStatus_Single + 0.69*Department_Sales - 0.29*Age_bins

 

Key Takeaways

3 Strategies for Employee Retention

 

Major focus on Sales department (high turnover) and entry-level position

 

Minimise business trips and anticipate high-demand cycles to  foster a work environment pro work-life balance

 

Minimise overtime as much as possible and reward with stock options or training sessions

 

 

Employee_Retention

By Alexis Lacabane