🏅 Olympic Data Engineering & Analytics Project

A complete data engineering pipeline and dashboarding project using Azure, Databricks, and Power BI. This project processes historical Olympic data using a Bronze-Silver-Gold architecture and presents insights with interactive dashboards.

🚀 Project Overview

This project was completed as part of a data engineering internship and demonstrates how to:

Ingest and store data using Azure Blob Storage
Process large volumes of structured data using Databricks with PySpark
Build a multi-layered architecture (Bronze → Silver → Gold)
Create and share interactive dashboards with Power BI

📁 Folder Structure

Olympic-DataEngineering-Project/ ├── datasets/ # Raw .csv files (optional or sample data) ├── images/ # Dashboard screenshots │ ├── Olympic dashboard page1.png │ └── Olympic Dashboard Page 2.png ├── O_Bronze.py # Bronze Layer script ├── O_Silver.py # Silver Layer script ├── O_Gold.py # Gold Layer script ├── olympicpbi.pdf # Power BI Dashboard export (PDF) ├── olympicp dashboard.pbix # Power BI file ├── LICENSE └── README.md

⚙️ Tools & Technologies

Tool/Service	Purpose
Azure Blob Storage	Hosting and storing raw Olympic CSVs
Azure Databricks	Processing data using PySpark Notebooks
Python & SQL	Data transformation & querying
Power BI	Dashboard creation and data visualization

🧱 Architecture: Bronze – Silver – Gold

Azure Blob Storage (.csv) ↓ Bronze Layer (Raw Mount) ↓ Silver Layer (Cleaned/Transformed) ↓ Gold Layer (Aggregated Tables) ↓ Power BI Dashboard

📜 Description of Processing Layers

🔸 Bronze Layer

Mounts raw Olympic .csv data from Azure Blob Storage to Databricks.
Minimal processing; serves as the raw data zone.

🔸 Silver Layer

Cleans and transforms raw data:
- Handles nulls
- Joins tables (athlete, sport, event, medal)
- Formats columns

🔸 Gold Layer

Generates aggregated insights and analysis-ready tables:
- Medal counts by athlete, gender, year, and country
- Total athletes by sport and season
- Demographic metrics like GDP, population

📊 Power BI Dashboards

🔵 Page 1 – Summary Overview

🥇 Total Medals: 27,497
👥 Total Athletes: 19,702
📊 Top Sports by Athlete Count
🌍 Medals by Country (Treemap)
🗺️ GDP by Country (Map)
🏅 Medal Counts by Athlete

🟢 Page 2 – Gender & Demographics

👩‍🦱 Gender Distribution of Medals
📈 Population by Country
📍 Olympic Host Cities (Map)
🔎 Filters by Year, Gender, Country, Season

🧠 Key Learnings

Real-world Lakehouse architecture for data warehousing
Data ingestion and transformation using Databricks + PySpark
Seamless integration of Azure data services with Power BI
Effective data storytelling through dashboards

📎 Files You Can Explore

File/Folder	Purpose
`O_Bronze.py`	Bronze Layer logic (mount raw data)
`O_Silver.py`	Cleaned and transformed dataset logic
`O_Gold.py`	Aggregated Gold Layer metrics
`olympicpbi.pdf`	View dashboard if you don’t have Power BI
`olympicp dashboard.pbix`	Open and interact with the dashboard
`images/`	Screenshots for preview in README

🔗 GitHub Repository

View This Project on GitHub

👤 Author

Kushmi Anuththara
Master's in Data Science | Data Engineer | Azure & Power BI Enthusiast
📍 Based in Sweden
📫 LinkedIn

📄 License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏅 Olympic Data Engineering & Analytics Project

🚀 Project Overview

📁 Folder Structure

⚙️ Tools & Technologies

🧱 Architecture: Bronze – Silver – Gold

📜 Description of Processing Layers

🔸 Bronze Layer

🔸 Silver Layer

🔸 Gold Layer

📊 Power BI Dashboards

🔵 Page 1 – Summary Overview

🟢 Page 2 – Gender & Demographics

🧠 Key Learnings

📎 Files You Can Explore

🔗 GitHub Repository

👤 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
datasets		datasets
images		images
LICENSE		LICENSE
O_Bronze.py		O_Bronze.py
O_Gold.py		O_Gold.py
O_Silver.py		O_Silver.py
README.md		README.md
olympicp dashboard.pbix		olympicp dashboard.pbix
olympicpbi.pdf		olympicpbi.pdf

Folders and files

Latest commit

History

Repository files navigation

🏅 Olympic Data Engineering & Analytics Project

🚀 Project Overview

📁 Folder Structure

⚙️ Tools & Technologies

🧱 Architecture: Bronze – Silver – Gold

📜 Description of Processing Layers

🔸 Bronze Layer

🔸 Silver Layer

🔸 Gold Layer

📊 Power BI Dashboards

🔵 Page 1 – Summary Overview

🟢 Page 2 – Gender & Demographics

🧠 Key Learnings

📎 Files You Can Explore

🔗 GitHub Repository

👤 Author

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages