Determining Factors Resulting to Employee Attrition Using Data Mining Techniques

Full Text (PDF, 458KB), PP.22-29

Views: 0 Downloads: 0

Author(s)

Jennifer Anne A. Repaso 1,* Elenita T. Caparino 1 Mary Grace G. Hermogenes 1 Joann G. Perez 1

1. Bulacan State University

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2022.03.03

Received: 26 Oct. 2021 / Revised: 23 Dec. 2021 / Accepted: 25 Jan. 2022 / Published: 8 Jun. 2022

Index Terms

Data Mining Techniques, Employee Attrition, Filtering Algorithm, Naïve Bayes Algorithm, Confusion Matrix.

Abstract

Business Process Outsourcing is a budding industry which currently employs millions of workers in the Philippines which draws applicants from undergraduate to professionals. It provides high-quality, well-paying jobs to millions of Filipinos while inspiring economic activity and investments all around. However, attrition rate of around 50 percent in the current year is a big challenge to predict employee turnover. This study came up with a model that can be adopted in the organization to predict possible attrition and guide the employers particularly the HR team in determining first-hand the type of applicant that they have by applying Data Mining techniques. The authors extracted significant predictors among the given data from a BPO company. Fast Correlation-Based Filtering Algorithm was performed to remove irrelevant data and increase learning accuracy. 1470 records with 21 attributes were initially provided and 17 were identified as significant after filtering and preprocessing of data was performed. The preprocessed data was used for model building with the application of Naïve Bayes Algorithm. The resulting model predicted percentage probability of hoppers and stayers. Among the 17 given variables, Total Working Years, Marital Status and Age ranked as the top predictors in determining possibility of attrition. The data was split into 60% training data or a total of 882 records and 40% testing data or a total of 588 records.  The predicted number of stayers is 542 or 92.2% and the predicted hoppers or who likely to resign are 46 or 7.8% The model was tested and evaluated to check accuracy of result through confusion matrix cross validation technique which yielded an accuracy percentage of 84.69%. 

Cite This Paper

Jennifer Anne A. Repaso, Elenita T. Capariño, Mary Grace G. Hermogenes, Joann G. Perez, " Determining Factors Resulting to Employee Attrition Using Data Mining Techniques", International Journal of Education and Management Engineering (IJEME), Vol.12, No.3, pp. 22-29, 2022. DOI: 10.5815/ijeme.2022.03.03

Reference

[1]B. K. Bhardwaj and S. Pal, “Data Mining: A prediction for performance improvement using classification,” vol. 9, no. 4, 2012.

[2]J. Galopo Perez and E. S. Perez, “Predicting Student Program Completion Using Naïve Bayes Classification Algorithm,” Int. J. Mod. Educ. Comput. Sci., vol. 13, no. 3, pp. 57–67, 2021, doi: 10.5815/ijmecs.2021.03.05.

[3]E. T. Capariño and A. M. Sison, “Application of the Modified Imputation Method to Missing Data to Increase Classification Performance,” 2019 IEEE 4th Int. Conf. Comput. Commun. Syst., pp. 134–139, 2019.

[4]Q. a. Al-Radaideh and E. Al Naqi, “Using Data Mining Techniques to Build a Classification Model for Predicting Employees Performance,” Int. J. Adv. Comput. Sci. Appl., vol. 3, no. 2, pp. 144–151, 2012.

[5]R. Saxena, “How The Naive Bayes Classifier Works In Machine Learning,” Dataaspirant. 2017.

[6]J. A. A. Repaso and E. T. Capariño, “Analyzing and predicting career specialization using classification techniques,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 1 Special Issue 3, pp. 342–348, 2020, doi: 10.30534/ijatcse/2020/5391.32020.

[7]E. T. Caparino, “Analyzing and Predicting IT Career Specialization Using Naïve Bayes Algorithm”.

[8]M. Shouman, T. Turner, and R. Stocker, “INTEGRATING NAIVE BAYES AND K-MEANS CLUSTERING WITH DIFFERENT INITIAL CENTROID SELECTION METHODS IN THE DIAGNOSIS OF HEART DISEASE PATIENTS,” pp. 125–137, 2012, doi: 10.5121/csit.2012.2511.

[9]R. Shinde, S. Arjun, P. Patil, and P. J. Waghmare, “An Intelligent Heart Disease Prediction System Using K-Means Clustering and Naïve Bayes Algorithm,” vol. 6, no. 1, pp. 637–639, 2015.

[10]H. Shaziya, R. Zaheer, and G. Kavitha, “Prediction of Students Performance in Semester Exams using a Naïve bayes Classifier,” pp. 9823–9829, 2015, doi: 10.15680/IJIRSET.2015.0410072.

[11]M. June, W. Ahmed, and M. T. Scholar, “Available Online at www.ijarcs.info Performance Analysis of Naïve Bayes Algorithm on Crime Data using Rapid Miner,” vol. 8, no. 5, pp. 683–687, 2017.

[12]E. Studies, “Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques,” pp. 24–30.

[13]W. A. Van Eeden et al., “Predicting the 9-year course of mood and anxiety disorders with automated machine learning : A comparison between auto-sklearn , naïve Bayes classifier , and traditional logistic regression,” Psychiatry Res., vol. 299, no. October 2020, p. 113823, 2021, doi: 10.1016/j.psychres.2021.113823.

[14]Muladi, U. Pujianto, and U. Qomaria, “Predicting high school graduates using Naive Bayes in State University Entrance Selections,” 4th Int. Conf. Vocat. Educ. Training, ICOVET 2020, pp. 155–159, 2020, doi: 10.1109/ICOVET50258.2020.9230336.

[15]K. Yadav, “Comparing the Performance of Naive Bayes And Decision Tree Classification Using R,” no. December, pp. 11–19, 2019, doi: 10.5815/ijisa.2019.12.02.

[16]S. Maitra, S. Madan, R. Kandwal, and P. Mahajan, “Mining authentic student feedback for faculty using Naïve Bayes classifier,” Procedia Comput. Sci., vol. 132, pp. 1171–1183, 2018, doi: 10.1016/j.procs.2018.05.032.

[17]N. Deepa, J. S. Priya, and T. Devi, “Towards applying Internet of Things and Machine Learning for the Risk Prediction of COVID-19 in pandemic situation using Naive Bayes Classifier for improving Accuracy,” Mater. TODAY Proc., 2022, doi: 10.1016/j.matpr.2022.03.345.

[18]A. Yudhana, D. Sulistyo, and I. Mufandi, “GIS-based and Naïve Bayes for nitrogen soil mapping in Lendah, Indonesia,” Sens. Bio-Sensing Res., vol. 33, p. 100435, 2021, doi: 10.1016/j.sbsr.2021.100435.

[19]L. Dey, S. Chakraborty, A. Biswas, B. Bose, and S. Tiwari, “Sentiment Analysis of Review Datasets Using Naïve Bayes‘ and K-NN Classifier,” Int. J. Inf. Eng. Electron. Bus., vol. 8, no. 4, pp. 54–62, 2016, doi: 10.5815/ijieeb.2016.04.07.

[20]S. Deepajothi and S. Selvarajan, “A Comparative Study of Classification Techniques On Adult Data Set 1,” vol. 1, no. 8, pp. 1–8, 2012.