Skip to content

lubobali/airflow-dq-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”„ Airflow DQ Pipeline

Production-grade data pipeline using Write-Audit-Publish pattern β€” the same approach used at Netflix, Airbnb, and Spotify.

Pipeline

🎯 The Problem

Most pipelines write directly to production. If bad data slips through, stakeholders see it before you can fix it.

βœ… The Solution

Never write to production directly. Instead:

Sensor β†’ Staging β†’ DQ Checks β†’ Production β†’ Cleanup

πŸ“Š Pipeline Architecture

Task What It Does Why It Matters
wait_for_polygon_tickers Waits for upstream data Don't run on empty tables
fetch_to_staging Loads to staging table Isolate unvalidated data
run_dq_checks 4 quality validations Catch issues early
exchange_to_production Atomic partition swap Zero-downtime publish
cleanup_staging Drops temp table No orphan data

πŸ›‘οΈ Data Quality Checks

βœ“ Row count > 0           # Table not empty
βœ“ No NULL close prices    # Required field present  
βœ“ All prices > 0          # Business logic validation
βœ“ All volumes >= 0        # No negative volumes

If ANY check fails β†’ pipeline stops β†’ bad data never reaches production.

πŸ”§ Tech Stack

  • Orchestration: Apache Airflow
  • Data Lake: PyIceberg + AWS Glue Catalog
  • Storage: S3 (Iceberg format)
  • Pattern: Write-Audit-Publish (idempotent, backfillable)

🧠 Key Concepts Demonstrated

  • βœ… Idempotency (same input = same output)
  • βœ… Partition-scoped overwrites (preserve history)
  • βœ… Sensor-based dependencies
  • βœ… Staging table isolation
  • βœ… Automated data quality gates

πŸ“š Credit

Pattern learned from Zach Wilson β€” ex-Netflix, ex-Airbnb Data Engineer β€” via DataExpert.io bootcamp.

πŸ”— Related Projects


⭐ If this helped you, star the repo!

About

Production-grade Airflow DAG with Write-Audit-Publish pattern: staging tables, data quality checks, idempotent pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages