predictive_modeling_capstone

I created an ensemble predictive model for employee turnover using HR data with interesting results!

SALIFORT MOTORS - EMPLOYEE TURNOVER PREDICTIVE MODELING CAPSTONE GOOGLE ADVANCED DATA ANALYTICS

DESCRIPTION

Thank you for reviewing my project!

This was my capstone project in the Google Advanced Data Analytics program, in which I was tasked to create a predictive model, analyze its insights, and present them to stakeholders at a large consulting company (Salifort Motors). My end deliverable is a random forest model with strong precision, accuracy, f1, AUC, and recall scores, indicating the model both fit well to the data and identified the strongest predictors for employee turnover.

Through this process of building a random forest with Sci-Kit Learn from scratch, I learned much, and was able to employ in aggregation the entirety of skills I honed in Google's Advanced Data Analytics course (I highly recommend this program for honing skills in machine learning).

RESULTS

The end result is that the model represents that satisfaction level, number of projects, tenure and average monthly work hours are ostensibly the strongest predictors employee turnover. Not only were the scores objectively good, but the results coincide with simple rationale--of course lower tenure, too many projects, low satisfaction, and outrageous work hours are going to burn out employees and contribute to turnover. In this hypothetical scenario, I presented these findings to stakeholders, specifically the HR department, and recommended that they take special care in ensuring a proper balance of projects and work hours for new employees, and to promote them after the "sweet spot" of two years or so to help encourage extended tenure, after which the likelihood of turnover lessens. I also made note that there could be seriously untapped potential in data for employee promotion (they never promote anyone, so there is no data with which to expand the model).

What else would I do? I would probably go back in and incorporate a little more feature engineering and create some other models (tuned decision tree, boosted, regression, etc.) to compare model performance, and to make my analysis much more robust. I think with additional analysis, I could potentially get much closer to the true heart of why employees are leaving. But I am quite satisfied with the performance of my random forest model!

TECHNICAL

I did my work in Jupyter Notebook, leveraged the power of Python, and utilized many packages including: Sci-Kit Learn, pandas, numpy, matplotlib.

Other packages I imported for potention use (but have not used yet) include: xgboost, pickle, and statsmodels.

CREDITS

Google Advanced Data Analytics Coursera Kaggle

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
dataset		dataset
jupyter notebook		jupyter notebook
png		png
python		python
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

predictive_modeling_capstone

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

predictive_modeling_capstone

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages