A comprehensive collection of Apache Airflow DAG examples covering beginner to advanced use cases, including operators, sensors, XCom, webhooks, ETL pipelines, and complete monitoring setup with Prometheus and Grafana.
- Project Overview
- Repository Structure
- DAG Examples
- Monitoring Setup
- Getting Started
- Learning Path
- Prerequisites
- Contributing
This repository provides production-ready Airflow examples that demonstrate:
- Basic to Advanced Operators (Bash, Python, Sensors)
- Workflow Orchestration patterns and best practices
- Inter-task Communication with XCom
- Error Handling and retry mechanisms
- External Integrations (Databases, Webhooks, APIs)
- Real-world ETL Pipelines with modern tools
- Complete Monitoring with Prometheus & Grafana
Airflow-Dag-Scripts/
βββ π Example_01_Bash_Operator/ # Basic bash operations
βββ π Example_02_Python_Operator/ # Python task execution
βββ π Example_03_Branching_and_Dummy/ # Conditional workflows
βββ π Example_04_XCom_and_Kwargs/ # Inter-task communication
βββ π Example_05_Master_and_Child_Xcom/ # Complex XCom patterns
βββ π Example_06_Capture_LongRunningDags/ # Performance monitoring
βββ π Example_07_Database_Connection/ # Database integrations
βββ π Example_08_Sensors/ # File and time sensors
βββ π Example_09_Webhook_Notifications/ # Teams webhook integration
βββ π Example_10_NYC_Taxi_Pipeline/ # Real-world data pipeline
βββ π Example_11_Advanced_ETL_DuckDB_Pipeline/ # Modern ETL with DuckDB
βββ π Example_12_Dynamic_DAG_Factory/ # Dynamic DAG generation
βββ π monitoring/ # Complete monitoring stack
βββ π README.md # This file
| Example | Concept | Description | Features |
|---|---|---|---|
| 01_Bash_Operator | Basic Operations | Execute bash commands in Airflow | BashOperator, Task dependencies, Templating |
| 02_Python_Operator | Python Tasks | Run Python functions as tasks | PythonOperator, Function parameters, Return values |
| 03_Branching_and_Dummy | Conditional Logic | Implement conditional workflows | BranchPythonOperator, DummyOperator, Decision trees |
| Example | Concept | Description | Features |
|---|---|---|---|
| 04_XCom_and_Kwargs | Data Passing | Share data between tasks | XCom, Task context, **kwargs |
| 05_Master_and_Child_Xcom | Complex XCom | Advanced inter-task communication | Master-child patterns, Data serialization |
| 06_Capture_LongRunningDags | Performance | Monitor and analyze DAG performance | Long-running task detection, Performance metrics |
| 07_Database_Connection | Database Integration | Connect to SQL databases | SqlOperator, Connection management, Data extraction |
| 08_Sensors | Event Detection | Wait for files and time conditions | FileSensor, TimeSensor, Event-driven workflows |
| Example | Concept | Description | Features |
|---|---|---|---|
| 09_Webhook_Notifications | External Integration | Microsoft Teams notifications | Webhook integration, Alert systems, Error notifications |
| 10_NYC_Taxi_Pipeline | Real-world ETL | Complete data pipeline example | Data ingestion, Transformation, Loading patterns |
| 11_Advanced_ETL_DuckDB_Pipeline | Modern ETL | Advanced pipeline with DuckDB | Modern data stack, Analytics-ready data, Performance optimization |
| 12_Dynamic_DAG_Factory | Dynamic DAGs | Generate DAGs programmatically | DAG factory patterns, Configuration-driven workflows |
Location: monitoring/
Complete observability stack for Airflow with:
- π Prometheus: Metrics collection and storage
- π Grafana: Visualization and dashboards
- π StatsD Exporter: Airflow metrics forwarding
- π³ Docker Compose: Easy deployment
- β Real-time Metrics: DAG runs, task duration, success rates
- β Custom Dashboards: Pre-built Airflow dashboard
- β Alerting: Configurable alerts for failures
- β Historical Analysis: Long-term performance trends
cd monitoring
docker-compose up -dAccess Points:
- π Grafana: http://localhost:3000 (admin/admin)
- π Prometheus: http://localhost:9090
- π Metrics: http://localhost:9102/metrics
Create virtual environment and install Airflow:
# Create virtual environment
python -m venv airflow-env
source airflow-env/bin/activate # On Windows: airflow-env\Scripts\activate
# Upgrade pip
pip install --upgrade pip
# Install Airflow
pip install "apache-airflow==2.7.3"
# For database support (optional)
pip install "apache-airflow[postgres,mysql]==2.7.3"# Initialize database
airflow db init
# Create admin user
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com \
--password adminQuick Start (Development):
airflow standaloneProduction Setup:
# Terminal 1: Start webserver
airflow webserver --port 8080
# Terminal 2: Start scheduler
airflow schedulerAccess: http://localhost:8080 (admin/admin)
# Copy DAG files to Airflow dags folder
cp Example_*/code.py ~/airflow/dags/
# Or set custom DAGs folder
export AIRFLOW__CORE__DAGS_FOLDER=/path/to/this/repo# Start monitoring stack
cd monitoring
docker-compose up -d
# Configure Airflow metrics in ~/airflow/airflow.cfg
[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
# Restart Airflow- Basic Operators: Examples 01-02
- Task Dependencies: Understanding DAG structure
- Airflow UI: Navigation and monitoring
- Conditional Logic: Example 03
- Data Passing: Examples 04-05
- External Systems: Examples 07-08
- Real-world Pipelines: Examples 10-11
- Dynamic DAGs: Example 12
- Monitoring: Complete setup
- Error Handling: Retry strategies and alerts
- Performance: Example 06
- Integration: Example 09
# List all DAGs
airflow dags list
# Trigger DAG manually
airflow dags trigger <dag_id>
# Pause/Unpause DAG
airflow dags pause <dag_id>
airflow dags unpause <dag_id>
# Show DAG structure
airflow dags show <dag_id># List tasks in DAG
airflow tasks list <dag_id>
# Test single task
airflow tasks test <dag_id> <task_id> <execution_date>
# Clear task instances
airflow tasks clear <dag_id> --start-date <YYYY-MM-DD># Re-serialize DAGs
airflow dags reserialize
# Check DAG parsing errors
airflow dags list-import-errors
# View logs
airflow tasks logs <dag_id> <task_id> <execution_date>- Python 3.8+ (3.9+ recommended)
- pip package manager
- Docker (for monitoring stack)
- Git for version control
# Database drivers
pip install psycopg2-binary # PostgreSQL
pip install mysqlclient # MySQL
# Additional operators
pip install apache-airflow-providers-postgres
pip install apache-airflow-providers-http
pip install apache-airflow-providers-ftp- RAM: 4GB minimum, 8GB recommended
- Storage: 10GB for examples and logs
- Network: Internet access for package downloads
- π ETL Pipelines: Extract, transform, load data workflows
- π Data Quality: Automated data validation and cleansing
- π Data Sync: Keeping systems in sync with scheduled updates
- π CI/CD Integration: Automated deployment workflows
- π± System Monitoring: Health checks and automated responses
- ποΈ File Processing: Automated file ingestion and processing
- π Report Generation: Scheduled business report creation
- π― KPI Calculation: Automated metric computation
- π§ Alert Systems: Automated business alerts and notifications
This is an educational repository focused on Airflow learning. Contributions welcome:
- π‘ New Examples: Add more DAG patterns or use cases
- π Bug Fixes: Improve existing examples or documentation
- π Documentation: Enhance explanations or add tutorials
- π§ Optimizations: Performance improvements or best practices
For major changes, please open an issue first to discuss your ideas.
- Apache Airflow Documentation
- Airflow Best Practices
- Airflow Operators
- Airflow Providers
- Prometheus Metrics
Happy Orchestrating! π
This repository provides a comprehensive journey through Apache Airflow workflow orchestration. Use these examples to build production-ready data pipelines and automation workflows!
We recommend using a virtual environment.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install --upgrade pip
pip install apache-airflowOr, for a specific Airflow version:
pip install "apache-airflow==2.7.3"For MySQL/Postgres support, add the relevant extras, e.g.:
pip install "apache-airflow[mysql]==2.7.3"
airflow db initFor quick testing, use:
airflow standaloneThis will start the webserver and scheduler, and create an admin user (shown in the output).
- Web UI: http://localhost:8080
- Default user:
admin/ password shown in terminal
Copy or symlink the example DAG .py files into your Airflow dags/ folder (default: ~/airflow/dags/).
cp Example_*/code.py ~/airflow/dags/Or set the AIRFLOW__CORE__DAGS_FOLDER environment variable to point to this repo.
To run components separately:
airflow webserver --port 8080
airflow scheduler- List all DAGs:
airflow dags list
- List all tasks in a DAG:
airflow tasks list <dag_id>
- Trigger a DAG run manually:
airflow dags trigger <dag_id>
- Pause/Unpause a DAG:
airflow dags pause <dag_id> airflow dags unpause <dag_id>
- Show DAG structure as ASCII:
airflow dags show <dag_id>
- Test a task (does not affect DB state):
airflow tasks test <dag_id> <task_id> <execution_date>
- Re-serialize all DAGs (useful for troubleshooting DAG parsing issues):
airflow dags reserialize
- Clear task instances:
airflow tasks clear <dag_id> --start-date <YYYY-MM-DD> --end-date <YYYY-MM-DD>
To enable metrics and dashboards:
-
Go to the
monitoring/directory:cd monitoring -
Start Prometheus, Grafana, and StatsD exporter:
docker-compose up -d
-
Enable Airflow metrics in
airflow.cfg:[metrics] statsd_on = True statsd_host = localhost statsd_port = 8125 statsd_prefix = airflow -
Restart Airflow.
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
To stop Airflow:
# If using standalone
pkill -f airflow
# Or stop webserver/scheduler individually
airflow webserver --stop
airflow scheduler --stopTo stop monitoring stack:
cd monitoring
docker-compose down -vFor more details, see the comments in each DAG file.