This project is part of my Elevvo Internship (Data Analytics track).
The objective was to clean and analyze real-world survey data from the Kaggle Data Science Survey (2017–2021) to generate actionable insights about respondents' demographics, coding experience, and programming language preferences.
- Source: Kaggle Data Science Survey 2017–2021
- Contains survey responses from data science professionals worldwide (age, gender, country, experience, tools, languages, etc.)
- Python
- Pandas
- Matplotlib
- Seaborn
- Jupyter Notebook
- Data Loading – Read the survey dataset into a Pandas DataFrame
- Data Cleaning – Handle missing values, remove duplicates, format columns
- Summary Statistics – Inspect distributions of age, gender, country, experience, and programming languages
- Key Insights – Extract most common demographics, experience levels, and popular programming languages
- Data Visualization – Create summary plots for age, gender, country, experience, and top languages
- Insights & Conclusion – Interpret the findings
- Age Group: Most respondents are 25–29 years old
- Gender: Majority identify as Male
- Country: India had the highest number of participants
- Coding Experience: Most respondents have 3–5 years of coding experience
- Programming Language: Python is the most popular language
- Install dependencies:
pip install -r requirements.txt - Open the notebook:
jupyter notebook Survey_Data_Insight_Generation.ipynb




