This repo contains the sample data, scripts, and command references for the live demos in the Data Engineering Bootcamp course.
- Python 3.9+
- DuckDB CLI or Python package (
pip install duckdb)
pip install -r requirements.txtClone this repo and cd into it. All scripts assume you're running from the repo root.
| Session | Demo | File(s) |
|---|---|---|
| Day 1, Session 2 | Source Discovery with DuckDB | scripts/00_source_profiler.py, demo_commands/day1_session2_source_profiler.md |
| Day 1, Session 3 | Ethical Loading with DuckDB | scripts/01_ethical_loading.py, demo_commands/day1_session3_ethical_loading.md |
| Day 1, Session 4 | The Bronze Layer | scripts/02_bronze_layer.py, demo_commands/day1_session4_bronze_layer.md |
| Day 2, Session 1 | Data Modeling with dbt core | models/, demo_commands/day2_session1_dbt_modeling.md |
| Day 2, Session 2 | dbt Test Circuit Breakers | models/staging/schema.yml, tests/, demo_commands/day2_session2_dbt_tests.md |
| Day 2, Session 3 | Lineage | demo_commands/day2_session3_lineage.md |
All sample data is in the seeds/ directory:
loyalty_customers.csv- 15 loyalty app customers with PII fieldssupply_chain.csv- 18 supply chain orders with deliberate data quality issues (mixed date formats, non-numeric values in numeric columns, inconsistent naming, missing IDs)transactions.csv- 40 transactions across 3 store locations
The data is small by design (10-20 rows per file) so it's easy to display on screen during demos. It includes deliberate data quality issues (nulls, inconsistent formats) for teaching purposes.
# Day 1 Session 2: Source Profiler
python scripts/00_source_profiler.py
# Day 1 Session 3: Ethical Loading
python scripts/01_ethical_loading.py
# Day 1 Session 4: Bronze Layer
python scripts/02_bronze_layer.py
# Day 2 Session 1: dbt Modeling (requires dbt-core, dbt-duckdb)
dbt run --profiles-dir .
# Day 2 Session 2: dbt Tests
dbt test --profiles-dir .
# Day 2 Session 3: Lineage
dbt docs generate --profiles-dir .
dbt ls --select +fct_daily_sales --output path --profiles-dir .Each script is self-contained and creates/updates a local course_demo.duckdb file. Run them in order.