Skip to content

SamanFatima7/exploratory-data-analysis

Repository files navigation

Exploratory Data Analysis

A collection of EDA notebooks — the kind of work that happens before any model gets trained. Curiosity first, conclusions second.

Every dataset has a story, and finding it is half the job. These notebooks walk through public datasets across health, business, food, tech, and lifestyle — each one focused on what the data actually says when you stop assuming and start looking.


📓 Notebooks in this repo

1. Heart Health Insights — EDA on cardiovascular risk

Looking at the classic heart-disease dataset through a clinical-curiosity lens. Which features correlate with risk, where does the data lie, and what would a doctor want to see in a one-page summary? Heatmaps, distribution plots, and a clean walk through every variable.

📔 Open on Kaggle →


2. Top 50 Self-Made Billionaires

Who built their fortunes themselves, and what patterns emerge? Industry breakdowns, age-at-first-billion, country of origin, and the visualizations to back it all up. A fun one — useful as a template for any "rank the top N" dataset.

📔 Open on Kaggle →


3. Spying on Global Food Production

A global look at what the world grows, raises, and processes. Country-level comparisons, crop trends over decades, and a few uncomfortable observations about food inequality hiding in the numbers.

📔 Open on Kaggle →


4. Crunching the AppleApp Numbers

The App Store at scale — what categories dominate, what people pay for, and where the long-tail starts. Charts that go beyond "top 10 by rating" into the actual economics of the platform.

📔 Open on Kaggle →


5. Millions Sold — a 5-Year Breakdown

Five years of sales data, sliced by region, time, and product. The notebook that doubles as a tutorial on how to structure year-over-year analysis without getting lost in your own pivot tables.

📔 Open on Kaggle →


6. Visual Insights into Online Food Taste

What makes someone order what they order? A look at online food-delivery preferences with the kind of charts that work for a stakeholder deck, not just a notebook.

📔 Open on Kaggle →


7. Data Viz — Your Beginner's Roadmap

Less an analysis, more a guide. If you're new to data visualization in Python, this is the notebook I wish I'd had on day one. Matplotlib, Seaborn, Plotly — when to use which, and what the common traps look like.

📔 Open on Kaggle →


🛠 Stack

Python · pandas · NumPy · Matplotlib · Seaborn · Plotly · Jupyter

📂 How this repo is organized

Each notebook lives in its own .ipynb file. If you want to run them locally:

git clone https://github.com/samanfatima7/exploratory-data-analysis.git
cd exploratory-data-analysis
pip install -r requirements.txt
jupyter notebook

Datasets are linked from each notebook on Kaggle.

👋 About

I'm Saman Fatima — Kaggle Grandmaster (highest rank 24 ), data scientist from Pakistan. More of my work lives on Kaggle and LinkedIn.

If you found something useful here, a ⭐ goes a long way.

About

EDA and data-viz notebooks across health, business, and lifestyle datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors