Google_App_Store_Rating

EDA & Data Preprocessing on Google App Store Rating Dataset.

Domain: Mobile device apps

Context :

The Play Store apps data has enormous potential to drive app-making businesses to success. However, many apps are being developed every single day and only a few of them become profitable. It is important for developers to be able to predict the success of their app and incorporate features which makes an app successful. Before any such predictive-study can be done, it is necessary to do EDA and data-preprocessing on the apps data available for google app store applications. From the collected apps data and user ratings from the app stores, let's try to extract insightful information.

Objective:

The Goal is to explore the data and pre-process it for future use in any predictive analytics study.

Data Dictionary

Questions:

Import required libraries and read the dataset.
Check the first few samples, shape, info of the data and try to familiarize yourself with different features.
Check summary statistics of the dataset. List out the columns that need to be worked upon for model building.
Check if there are any duplicate records in the dataset? if any drop them.
Check the unique categories of the column 'Category', Is there any invalid category? If yes, drop them.
Check if there are missing values present in the column Rating, If any? drop them and and create a new column as 'Rating_category' by converting ratings to high and low categories(>3.5 is high rest low)
Check the distribution of the newly created column 'Rating_category' and comment on the distribution.
Convert the column "Reviews'' to numeric data type and check the presence of outliers in the column and handle the outliers using a transformation approach.(Hint: Use log transformation)
The column 'Size' contains alphanumeric values, treat the non numeric data and convert the column into suitable data type. (hint: Replace M with 1 million and K with 1 thousand, and drop the entries where size='Varies with device')
Check the column 'Installs', treat the unwanted characters and convert the column into a suitable data type.
Check the column 'Price' , remove the unwanted characters and convert the column into a suitable data type.
Drop the columns which you think redundant for the analysis.(suggestion: drop column 'rating', since we created a new feature from it (i.e. rating_category) and the columns 'App', 'Rating' ,'Genres','Last Updated', 'Current Ver','Android Ver' columns since which are redundant for our analysis)
Encode the categorical columns.
Segregate the target and independent features (Hint: Use Rating_category as the target)
Split the dataset into train and test.
Standardize the data, so that the values are within a particular range.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
Apps_data.csv		Apps_data.csv
Playstore_analysis_report.html		Playstore_analysis_report.html
README.md		README.md
Solved-Google Play Store Analysis.ipynb		Solved-Google Play Store Analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google_App_Store_Rating

EDA & Data Preprocessing on Google App Store Rating Dataset.

Domain: Mobile device apps

Context :

Objective:

Data Dictionary

Questions:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Google_App_Store_Rating

EDA & Data Preprocessing on Google App Store Rating Dataset.

Domain: Mobile device apps

Context :

Objective:

Data Dictionary

Questions:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages