Task 02 – Data Cleaning and Exploratory Data Analysis (Titanic Dataset)
The objective of this task is to perform data cleaning and exploratory data analysis (EDA) on the Titanic dataset to understand patterns and relationships in the data.
- Dataset Name: Titanic Dataset
- Source: Kaggle (Titanic Competition)
- File Used: train.csv
- Loaded the dataset using Pandas
- Checked dataset structure, missing values, and data types
- Handled missing values in Age and Embarked columns
- Dropped the Cabin column due to excessive missing data
- Performed exploratory data analysis using visualizations
- Analyzed survival patterns based on gender, passenger class, and age
- Python
- Pandas
- Matplotlib
- Seaborn
- Google Colab
- Females had a higher survival rate than males
- Passengers in 1st class had better survival chances
- Children were more likely to survive
- Most passengers were young adults
This task helped me understand the importance of data cleaning and how exploratory data analysis helps in identifying trends and patterns within real-world datasets.