Exploratory Data Analysis (EDA) Tutorial 📊

A hands-on collection of Jupyter notebooks for learning and practicing Exploratory Data Analysis (EDA).

This repository is organized as a step-by-step workflow:

Understand your dataset
Exploratory Data Analysis (EDA)
1. Explore univariate patterns
2. Explore multivariate patterns
Analyze bivariate and multivariate relationships
Detect outliers
Handle missing values
Train baseline ML models
1. Classification
2. Regression
3. Clustering

It includes multiple CSV datasets (stored in datasets/) so each notebook can be executed directly.

Repository Structure 🗂️

Notebooks 📓

1. Dataset Overview.ipynb
Initial data inspection: shape, data types, summary statistics, and first quality checks.
2.1 EDA - Univariate.ipynb
Distribution analysis for single variables (numerical and categorical).
2.2 EDA - Bivariate and Multivariate.ipynb
Relationship analysis using pairwise comparisons, grouped summaries, and multivariate visualizations.
3. Outliers.ipynb
Outlier detection methods and interpretation.
4. Missing Values.ipynb
Missing data inspection and practical handling/imputation techniques.
5.1 Model - Classification.ipynb
End-to-end supervised learning workflow for classification tasks: preprocessing, model training, and evaluation.
5.2 Model - Regression.ipynb
End-to-end supervised learning workflow for regression tasks with appropriate metrics and model diagnostics.
5.3 Model - Clustering.ipynb
Unsupervised learning workflow for clustering, including cluster quality analysis and interpretation.

Datasets 🧪

datasets/auto-mpg.csv
datasets/california-housing.csv
datasets/flights_seaborn.csv
datasets/healthcare-dataset-stroke-data.csv
datasets/iris_seaborn.csv
datasets/marketing-data.csv
datasets/outlier_detection_dataset.csv
datasets/students.csv
datasets/synthetic_stroke_data.csv
datasets/tips_seaborn.csv
datasets/titanic_seaborn.csv

Getting Started 🚀

1. Clone the repository 📥

git clone https://github.com/DataSciencePolimi/Exploratory-Data-Analysis
cd Exploratory-Data-Analysis

2. Create and activate a virtual environment (recommended) 🧰

python -m venv .venv
source .venv/bin/activate

3. Install dependencies 📦

pip install jupyter pandas numpy matplotlib seaborn scikit-learn scipy

Suggested Learning Path 🧭

For a structured learning flow, run notebooks in this order:

1. Dataset Overview.ipynb
2.1 EDA - Univariate.ipynb
2.2 EDA - Bivariate and Multivariate.ipynb
3. Outliers.ipynb
4. Missing Values.ipynb
5.1 Model - Classification.ipynb
5.2 Model - Regression.ipynb
5.3 Model - Clustering.ipynb

Goals of This Repository 🎯

Build intuition for reading and profiling real datasets
Practice selecting the right visual/statistical technique for each question
Learn robust preprocessing patterns before modeling
Improve reproducibility in data analysis workflows

Notes 📝

Dataset files are stored in datasets/; if you move them again, update notebook file paths accordingly.
The cache/ folder is required for slow computations (for example clustering metric sweeps) and stores precomputed arrays used by some notebooks.
If plots do not display, verify your Jupyter kernel and package installation.
Some notebooks may require rerunning cells from top to bottom after kernel restarts.

Author 👤

Riccardo Campi, PhD student in Information Technology,
Politecnico di Milano, Data Science Lab.

License ⚖️

This project is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploratory Data Analysis (EDA) Tutorial 📊

Repository Structure 🗂️

Notebooks 📓

Datasets 🧪

Getting Started 🚀

1. Clone the repository 📥

2. Create and activate a virtual environment (recommended) 🧰

3. Install dependencies 📦

Suggested Learning Path 🧭

Goals of This Repository 🎯

Notes 📝

Author 👤

License ⚖️

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Exploratory Data Analysis (EDA) Tutorial 📊

Repository Structure 🗂️

Notebooks 📓

Datasets 🧪

Getting Started 🚀

1. Clone the repository 📥

2. Create and activate a virtual environment (recommended) 🧰

3. Install dependencies 📦

Suggested Learning Path 🧭

Goals of This Repository 🎯

Notes 📝

Author 👤

License ⚖️