Improvement In Employee Retention

  • Words 1996
  • Pages 4
Download PDF

I. Abstract

Employee attrition is one of the major problems faced by many companies, start-ups nowadays. As loss of an employee incurs a huge costs to the company in terms of lost productivity, recruitment, and training costs. The biggest challenge faced by many organisations today is how to retain their employees. This research paper aims to provide the solutions for the retention of an employee or to increase the tenure of that employee. Firstly, we will predict the expected tenure of the currently working employees in the company by analyzing the employee attrition data, predicting the value of the employee to the company, and then so as to retain the employees, we will update each feature and generate the new value of the employees, subject to the constraint of the resources. Finally, we will predict the updated tenure of the employee corresponding to the newly generated value.

II. Introduction

Employees are the backbone of an organization. Hence, the retention of employees is important in keeping the organization on track. In order to retain the best talents, strategies aimed at satisfying employee’s needs are implemented, regardless of global companies or small-sized firms[1]. Employee attrition incurs huge costs in terms of lost productivity, recruitment, and training costs. Between costs associated with separation, loss of productivity, recruitment, interviewing, training, and onboarding, the loss of a single employee is estimated to cost the company one-third of that individual’s annual salary[2]. There could be various factors responsible for employees leaving the company. The major challenge faced by many organizations is to retain their employees. In this research paper, we will create models that predict the expected tenure, value of an employee to the company. We will treat each employee as an individual entity and suggest preventive attrition measures at an individual level. We will provide the appropriate measures to be taken to Retain the employee at the predicted tenure. Using these proposed solutions, we will predict the new expected tenure of the employee. So, In this way, we will solve the problem of employee retention.

Click to get a unique essay

Our writers can write you a new plagiarism-free essay on any topic

III. Literature Review

In Lucas (2013),It is reported that employers don’t understand the expense of high employee turnover. Recruiting new staff is costly due to advertising and administrative expenses, time and resources for on boarding and training as well as loss of productivity.[3] Omer and Laura (2015) said successful employee retention is essential to an organization’s stability, growth and revenue. Organizations can achieve employee retention by developing four strategies.[4]

In Regresion Analysis (2012), an effective human resource management practices namely employee empowerment, training and development, appraisal system compensation are the main factor for the success of a firm on employee retention. By using a multiple regression analysis, training and development, appraisal system compensation are significant to employee retention.[1]

In the book, ”Factors Affecting Employee Attrition and PredictiveModelling Using IBM HR Data”, establishing a predictive model for employee attrition involves data preprocessing with chi square versus logistic regression for feature selection, machine learning models and their comparison using the confusion matrix, precision, recall and f1-scores based on

IBM HR Data Set.[6] Analysis of RF, ANN, and SVMRegression Models are compared by Beijing Research Center accounting for the application of various regressive models for a specific use case of estimation.(2017) [5]

IV. Data

The Data taken is developed IBM R and D lab to provide a standard for the HR management issues incorporated by a medium scale enterprise. It is developed to uncover the factors that lead to employee attrition and explore important questions and ideas leading to a fruitful outcome containing data features pertaining to every aspect of a employee concerning his/her professional life. The Data itself is complete and is divided into two parts separating attrition and retention subsections of the employee. The initial training is done on attrition data set in the ratio of 80 percent train data and 20 percent test data and predicts values for the retention data set.

V. Methodology

A. Human Resource Management Models

A1. Remaining Tenure Prediction Model

A predictive regression model is trained by ensemble learning by combining different regressive models (B1-B6) and generating the smallest mode range for the specific employee using the employee attrition data set(from above) and then the expected tenure of the employees currently working in the company is predicted using the regression model on the retention data set along with acting as the test condition for the model. As, the number of attrition-ed employees will always be less than currently working employees in a specific period of time therefore the test condition is that the Remaining predicted mode from the model for the currently working employees should be greater than 0 for the validation of the model. TenurePred(Ei) > 0.

Let us assume x1,….,xn be the features of the employees from the data set, using which we are predicting the expected tenure(let us assume it to be xn + 1) and Ei represents the

ith employee.

A2. Employee Value Prediction Model

The employees are clustered into many clusters using the K Means algorithm, and each cluster is assigned a range mid-point of class size 5 denoting the value of the employee to the company and therefore defining the value of k to be 20. For clustering, the value is generated considering the weighted n+1 features: where w1,……,wn,wn+1 are the weights corresponding to each of the n+1 features x1,….,xn,xn+1 respectively and the n attributes(x1,…..,xn) are from the attrition-ed mployees data set and the predicted tenure forms the (n + 1)th attribute used for the value-generationn in the employee value prediction model.

A3. Resource consumption Model

This model evaluates a function A3i depending on all the attributes of an employee x1…xn in a weighted manner representing the resources being used according to the current attributes.

A4. Retention Technique SelectionModel

The above calculated value is considered and for the employees with more than 50 percent importance value will be considered for this model. This model in turn carries various Retention Techniques which are unique for each xi, updates the features x and re-evaluates the cluster in the model A2 for A2i generating a different importance value. Also the function A3i has to be minimized while updating and recurring for the suitable solution. This continues till all the combination of the techniques is exhausted and generates the best combination of technique possible for the each employee A2i being considered along with the most optimal Re- source possible. Therefore the argument is to max(A2i/A3i) where the Resi represents the Resource consumption of an employee i. This can be written as:- max(A2i/A3i) =max(A2i|x1…xn+1A3i|x1…xn )=max((A2i|x1…xn)∗(xn+1|x1….xn)A3i|x1…xn )= (xn+1|x1….xn)) ∗max(A2i|x1…xnA3i|x1…xn ) as xn+1 represents the predicted tenure which is considered to be constant for the whole iterative process considered and therefore can be taken out. = (A1) ∗max(A2iA3i )

A5. Updated Tenure Prediction Model

After calculating the optimal solution for the attributes of the employee we generate the updated predicted tenure according to our Model A1 i.e. x0. Now, the difference between x0−xn+1 is calculated and represents the Final Result for our Algorithm along with the Re- source consumption value and the updated at- tributes which can be used to reflect the actual retention strategies used for each employee.

B. Machine Learning Algorithms

B1. Support Vector Regression Algorithm[7]

As a supervised-learning approach, SVR trains using asymmetrical loss function, which equally penalizes high and low missestimates. One of the main advantages of SVR is that its computational complexity does not depend on the dimensionality of the input space. In the case of regression, a margin of tolerance (epsilon) is set in approximation to the SVM in incorporated. The expected values are:- y =∑N i=1(ai − aj).K(xi, x) + b The kernel function is:- k(xi, xj) = exp(− |xi−xj | 2 2σ2 )

B2. Random Forest Algorithm[8]

The training algorithm for random forests applies the general technique of bootstrap aggregating, or bagging, to tree learners. Given a training set X = x1, …, xn with responses Y = y1, …, yn bagging repeatedly (B times) selects a random sample with replacement of the training set and fits trees to these samples.After training, predictions for unseen samples x’ can be made by averaging the predictions from all the individual regression trees on x’: f = 1B ∑B b=1 fb(x′)

B3. Principal Component Regression [12]

The PCR method may be broadly divided into three major steps:

  1. Perform PCA on the observed data matrix for the explanatory variables to obtain the principal components, and then (usually) select a subset, based on some appropriate criteria, of the principal components so obtained for further use.
  2. Now regress the observed vector of outcomes on the selected principal components as covariates, using ordinary least squares regression (linear regression) to get a vector of estimated regression coefficients (with dimension equal to the number of selected principal components).
  3. Now transform this vector back to the scale of the actual covariates, using the selected PCA loadings (the eigenvectors corresponding to the selected principal components) to get the final PCR estimator (with dimension equal to the total number of covariates) for estimating the regression coefficients characterizing the original model.

B4. Lasso Regression Algorithm[11]

Under lasso, the loss is defined as:- Lasso(β) = ∑n i=1(yi−xiβ)2+λ ∑m j=1 |βj|

B5. Cox Regression Algorithm[9] 

The hazard function for the Cox proportional hazards model has the form λ(t|Xi) = λ0(t)exp(β1Xi1 + …..βpXip) = λ0(t)exp(Xiβ)

B6. Elastic Net Regression Algorithm[10] 

Elastic Net aims at minimizing the following loss function:- Lenet(β) =∑n i=1(yi−xiβ)2 2n +λ(1−α2∑ j=1mβ 2 + α ∑m j=1 |βi|) where α is the mixing parameter.

B7. K-Means Clustering[13] 

Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k ≤ n) sets S = S1, S2, …, Sk so as to find:- argS min ∑k i=1 1 2|Si| ∑ x,y∈Si ||x− y|| 2

VI. Results

At the end of evaluation, we expect that we can get a significant increase in the predicted tenure of a substantial percentage of employees by utilizing minimum amount of extra resources to be given in accordance of the predicted retention strategy.

VII. Conclusion

Therefore, we would show that in enterprises our predicted model will provide its extended help to the Human Resource Management Department in taking better and determined decisions towards a specific employee and therefore benefit the company by a significant margin by retaining the existing hired workforce and reducing a large amount of resources to be spent on scouting, hiring and training.


  1. Eric Ng Chee Hong ,Lam Zheng Hao,Ramesh Kumar,Charles Ramendran ,Vimala Kadiresan.An Effectiveness of Human Resource Management Practices on Employee Retention in Institute of Higher learning: – A Regression Analysis. International Journal of Business Research and Management (IJBRM), Volume (3) : Issue (2) : 2012.
  2. Matthew O’Connell and Mavis (Mei-Chuan) Kung. “The Cost of Employee Turnover.” In: Industrial Management (2007), pp. 14–19.
  3. Lucas, S. (2013). How much employee turnover really cost you. Retrieved from Inc.:
  4. Omer Cloutier,Laura Felusiak,Calvin Hill,Enda Jean Pemberton-Jones.The Importance of Developing Strategies for Employee Retention.Journal of Leadership, Accountability and Ethics Vol. 12(2) 2015.
  5. Yuan, H.; Yang, G.; Li, C.; Wang, Y.; Liu, J.; Yu, H.; Feng, H.; Xu, B.; Zhao, X.; Yang, X. Retrieving Soybean Leaf Area Index from Unmanned Aerial Vehicle Hyperspectral Remote Sensing: Analysis of RF, ANN, and SVM Regression Models. Remote Sens. 2017, 9, 309.
  6. Khan, Emad Afaq; Hayat Khan, Sumaira Muhammad.Factors Affecting Employee Attrition and Predictive Modelling Using IBM HR Data. Journal of Computational and Theoretical Nanoscience, Volume 16, Number 8, August 2019, pp. 3379-3383(5).
  7. Awad M., Khanna R. (2015) Support Vector Regression. In: Efficient Learning Machines. Apress, Berkeley, CA
  8. Ho, Tin Kam (1995). Random Decision Forests (PDF). Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282.
  9. Cox, D. (1972). Regression Models and Life-Tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2), 187-220.
  10. Hui Zou and Trevor Hastie(2005). Regularization and variable selection via the Elastic Net.Journal of the Royal Statistical Society, Series B. 301-320
  11. Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288
  12. Jolliffe, I. (1982). A Note on the Use of Principal Components in Regression. Journal of the Royal Statistical Society. Series C (Applied Statistics), 31(3), 300-303.
  13. Forgy, Edward W. (1965). ”Cluster analysis of multivariate data: efficiency versus interpretability of classifications”. Biometrics. 768-769.


We use cookies to give you the best experience possible. By continuing we’ll assume you board with our cookie policy.