Skip to content
This repository was archived by the owner on Apr 15, 2026. It is now read-only.

christinaet/de_bootcamp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Bootcamp - Course Demo Repo

This repo contains the sample data, scripts, and command references for the live demos in the Data Engineering Bootcamp course.

Prerequisites

  • Python 3.9+
  • DuckDB CLI or Python package (pip install duckdb)

Setup

pip install -r requirements.txt

Clone this repo and cd into it. All scripts assume you're running from the repo root.

Demo Map

Session Demo File(s)
Day 1, Session 2 Source Discovery with DuckDB scripts/00_source_profiler.py, demo_commands/day1_session2_source_profiler.md
Day 1, Session 3 Ethical Loading with DuckDB scripts/01_ethical_loading.py, demo_commands/day1_session3_ethical_loading.md
Day 1, Session 4 The Bronze Layer scripts/02_bronze_layer.py, demo_commands/day1_session4_bronze_layer.md
Day 2, Session 1 Data Modeling with dbt core models/, demo_commands/day2_session1_dbt_modeling.md
Day 2, Session 2 dbt Test Circuit Breakers models/staging/schema.yml, tests/, demo_commands/day2_session2_dbt_tests.md
Day 2, Session 3 Lineage demo_commands/day2_session3_lineage.md

Sample Data

All sample data is in the seeds/ directory:

  • loyalty_customers.csv - 15 loyalty app customers with PII fields
  • supply_chain.csv - 18 supply chain orders with deliberate data quality issues (mixed date formats, non-numeric values in numeric columns, inconsistent naming, missing IDs)
  • transactions.csv - 40 transactions across 3 store locations

The data is small by design (10-20 rows per file) so it's easy to display on screen during demos. It includes deliberate data quality issues (nulls, inconsistent formats) for teaching purposes.

Running All Demos

# Day 1 Session 2: Source Profiler
python scripts/00_source_profiler.py

# Day 1 Session 3: Ethical Loading
python scripts/01_ethical_loading.py

# Day 1 Session 4: Bronze Layer
python scripts/02_bronze_layer.py

# Day 2 Session 1: dbt Modeling (requires dbt-core, dbt-duckdb)
dbt run --profiles-dir .

# Day 2 Session 2: dbt Tests
dbt test --profiles-dir .

# Day 2 Session 3: Lineage
dbt docs generate --profiles-dir .
dbt ls --select +fct_daily_sales --output path --profiles-dir .

Each script is self-contained and creates/updates a local course_demo.duckdb file. Run them in order.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages