Bloomberg Commodities ETL – Simple & Scalable

Built in < 1 hour – no Bloomberg Terminal, no live data, just 6 real Kaggle commodity CSVs.

Goal

Show a clean, maintainable ETL pipeline that:

Auto-detects Date and numeric columns (Price, High, Low, etc.)
Handles real-world messiness: commas in numbers, missing columns, mixed formats
Combines all assets into one clean file
Delivers an interactive dashboard for instant insight

Designed for junior onboarding – runs in 60 seconds.

Tech Stack

Tool	Purpose
Polars	Fast, memory-efficient data processing
Streamlit	One-click interactive dashboard
CSV → CSV	Simple, portable, no external storage

Folder Structure

bloomberg-commodities-de/ ├── data/ │ └── raw/ # ← Your 6 Kaggle CSVs go here ├── pipeline.py # ← Ingest + clean + combine ├── dashboard.py # ← Interactive chart └── README.md

Dashboard Snapshots:

##Run Instructions:

# 1. Install (once)
pip3 install polars streamlit

# 2. Run ETL
python3 pipeline.py

# Bloomberg Commodities ETL – Day 1 Ready (ML + Anomaly Detection)

Built for Bloomberg interview prep: From Kaggle mocked CSVs to production-grade pipeline in <2 hours.  
**Now with Data Quality (DQ) guards + Z-Score anomaly detection** — catches negative prices and 3σ outliers instantly.

## 🚀 What's New in Day 1
- **DQ Check**: Auto-removes invalid prices ≤ 0 (critical for finance).
- **Anomaly Detection**: Log-transform + Z-Score (3σ rule) on prices — industry standard for commodity volatility.
- **Robustness**: Column-name agnostic (works with "Price", "Close", etc.).
- **Zero Extra Deps**: Pure Polars + NumPy/SciPy — no Great Expectations bloat.

**Sample Output**:

# 3. Launch dashboard
streamlit run dashboard.py


## 🎯 Goal
Scalable ETL for commodity data:
- Ingests messy CSVs (commas, % signs, mixed formats).
- Cleans, combines, and validates.
- Flags anomalies for trading desk alerts.
- Ready for Prefect orchestration (Day 2).

## 🛠 Tech Stack
| Tool | Purpose |
|------|---------|
| **Polars** | Lightning-fast data processing (10x Pandas). |
| **NumPy/SciPy** | Z-Score stats (log-normal for prices). |
| **Pandas** (minimal) | Temp conversions only. |
| **CSV Output** | Portable + human-readable (Parquet ready for prod). |

## 📁 Folder Structure
bloomberg-commodities-de/
├── data/
│   ├── raw/              # Drop your 6 Kaggle CSVs here (e.g., crude_oil.csv)
│   ├── combined_commodities.csv     # Raw combined output
│   └── combined_commodities_clean.csv  # DQ + anomaly-flagged
├── pipeline.py           # Full ETL + Day 1 upgrades
├── dashboard.py          # Interactive Streamlit viz (launch with streamlit run dashboard.py)
├── flow.py               # Prefect for logs,retries and visuals for Orchestration

Key Points :
- B-Pipe → Kafka: exactly-once, 100k+ msgs/sec
- Prefect: retries, observability, scheduling (cron="@daily")
- Polars: 10–50x faster than Pandas on commodity ticks
- Z-Score anomalies → alert trading desk instantly
- Final storage: columnar Parquet for BI tools (Tableau, etc.)
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bloomberg Commodities ETL – Simple & Scalable

Goal

Tech Stack

Folder Structure

Dashboard Snapshots:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
README.md		README.md
dashboard.py		dashboard.py
flow.py		flow.py
pipeline.py		pipeline.py

Folders and files

Latest commit

History

Repository files navigation

Bloomberg Commodities ETL – Simple & Scalable

Goal

Tech Stack

Folder Structure

Dashboard Snapshots:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages