Skip to content

wizrox/dsSlryProj

Repository files navigation

Data Science Salary Estimator: Project Overview

Created a tool that estimates data science salaries to help data scientists negotiate their income when they get a job. Scraped over sample of job descriptions from glassdoor using python and selenium. Engineered features from the text of each job description to quantify the value companies put on different Data-Science tools like python, excel, aws, and spark. Optimized Linear, Lasso, and Random Forest Regressors using GridsearchCV to reach the best model. Built a client facing API using flask.

Resources & Knowledge-base:

Code and Resources Used

Data-Scraping / Data-Collection

This is the first step where the script is written to pull-out the information available on web-page(information stored by web-elements and fields: -span,div, li, etc.)of the website. After pulling out the information it is stored in file for further processing.

P.S: the code has been modified according to the updated structure of the website for webscraping, there are fields/data that I couldn't able to find, for reading purposes the old lines of code is commented and updated code added underneath for better understanding.please take a note that I have changed the names of the column and files according to my need, if you are copy pasting this code you have to look for syntax errors in names of files and data-columns that are used in tutorial.

Data Cleaning

After scraping the data, It has to be clean and arrange to select the information suitable for Data EDA(Exploration and Data Analysis). this is done through the script which drop the redundant and excessive tables and keep the valid data/fields in column. upon the extraction of the necessary data , it will be stored in another csv file for further computation and processing.

Data EDA

After getting the relevant information in the form of data-set, There will be applied computational and mathemetical formulas to analyze the patterns and accuracy of the information, there are various techniques used to find out the accurate prediction of the result. after listing the comparison of the output from different models and techniques. A 'pickle' file is created whcih will be imported later on to used with different application scenario like client/server or service call to API.

dsSlryProj1 dsSlryProj3 dsSlryProj2

Productionization

After making a pickle file, to implement it in a client/server scenario, Flask API is downloaded and used to stimulate as a local server for listening to client request.

ds_salary_proj

#Resources:

https://www.youtube.com/watch?v=GmW4F6MHqqs&list=PL2zq7klxX5ASFejJj80ob9ZAnBHdz5O1t

https://towardsdatascience.com/selenium-tutorial-scraping-glassdoor-com-in-10-minutes-3d0915c6d905

https://github.com/PlayingNumbers/ds_salary_proj

https://github.com/arapfaik/scraping-glassdoor-selenium

About

Repo for the Data Science Analytics Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors