Understanding Employee Attrition Using Explainable AI

Sabbineni, Navya

Understanding Employee Attrition Using Explainable AI

Open Access

Author:: Sabbineni, Navya
Graduate Program:: Industrial Engineering
Degree:: Master of Science
Document Type:: Master Thesis
Date of Defense:: March 30, 2020
Committee Members:: Soundar Rajan Tirupatikumara, Thesis Advisor/Co-Advisor
Ling Rothrock, Program Head/Chair
Hui Yang, Committee Member
Ling Rothrock, Committee Member
Keywords:: Explainable AI
Abstract:: Artificial Intelligence and Machine Learning communities and applications have come a long way from being result focused to being more human intuitive, due to the myriad of fields that they are being deployed into and the far fetching consequences that they have. In this thesis, the concept of interpretability in machine learning models is studied. The concepts of local and global interpretability are explored by using the curated IBM HR employee attrition dataset. Classification algorithms Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines (SVM) and Tree based classifiers XGBoost and LightGBM have been trained and tested on the data set. Among all the models, XGBoost and Random Forest with cross validation (k=30) yielded the best accuracy of 86% followed by LightGBM with an accuracy of 85.7%. Attribute importance for each of the models XGBoost, LightGBM and Random Forests have been explained using the features of force plot, decision plot and summary plot from the newly developed python library SHAP. It had been found that while the overall trend (either positively or negatively) of the attributes contributing to attrition relatively remained same, the order of importance and their magnitude varied across both different models and different plots. Quantification of the effects of each variable was possible due to our explainable approach, which in turn can lead to result driven actions. Based on specific instance considered, Overtime was found to be a major contributor to attrition. In future, this study can be furthered by taking aggregate data effects into consideration and understanding how the individual effects sum up to give rise to aggregate conclusions.

Tools