Skip to content

Data cleaning and insight generation on Kaggle Data Science Survey (2017-2021) using Python, Pandas, and Seaborn.

Notifications You must be signed in to change notification settings

RightfulCode/Survey-Data-Insight-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Task 4: Data Cleaning and Insight Generation from Survey

This project is part of my Elevvo Internship (Data Analytics track).
The objective was to clean and analyze real-world survey data from the Kaggle Data Science Survey (2017–2021) to generate actionable insights about respondents' demographics, coding experience, and programming language preferences.

📂 Dataset

🛠️ Tools & Libraries

  • Python
  • Pandas
  • Matplotlib
  • Seaborn
  • Jupyter Notebook

🔎 Key Steps

  1. Data Loading – Read the survey dataset into a Pandas DataFrame
  2. Data Cleaning – Handle missing values, remove duplicates, format columns
  3. Summary Statistics – Inspect distributions of age, gender, country, experience, and programming languages
  4. Key Insights – Extract most common demographics, experience levels, and popular programming languages
  5. Data Visualization – Create summary plots for age, gender, country, experience, and top languages
  6. Insights & Conclusion – Interpret the findings

📊 Key Insights

  • Age Group: Most respondents are 25–29 years old
  • Gender: Majority identify as Male
  • Country: India had the highest number of participants
  • Coding Experience: Most respondents have 3–5 years of coding experience
  • Programming Language: Python is the most popular language

📈 Visualizations

Age Group Distribution

Age Group Distribution

Gender Distribution

Gender Distribution

Top 10 Countries

Top Countries

Coding Experience Distribution

Coding Experience

Top 5 Programming Languages

Top Languages

▶️ How to Run

  • Install dependencies: pip install -r requirements.txt
  • Open the notebook: jupyter notebook Survey_Data_Insight_Generation.ipynb

About

Data cleaning and insight generation on Kaggle Data Science Survey (2017-2021) using Python, Pandas, and Seaborn.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published