Skip to content

azmatsiddique/pipemedic

Repository files navigation

🚑 PipeMedic

Intelligent Data Pipeline Recovery Agent

Python 3.11+ License: MIT Docker Code Style: Black

PipeMedic is an open-source AI agent acting as a "first responder" for your data infrastructure. It automatically detects, diagnoses, and repairs data pipeline failures caused by schema drift, reducing Mean Time To Recovery (MTTR) from hours to minutes.


🚀 Features

  • 🔍 Auto-Detection: Real-time polling of Apache Airflow for failed DAGs and tasks.
  • 🧠 Intelligent Diagnosis: Automatically connects to your database (PostgreSQL) to compare actual vs. expected schemas.
  • 🛠️ AI-Powered Fixes: Leverages LLMs (GPT-4o) to generate precise SQL ALTER TABLE statements or Python code fixes.
  • 🤖 Automated Actions: Creates a Git branch, commits the fix, and opens a Pull Request with a descriptive summary.
  • 📢 Instant Notifications: Sends rich Slack alerts with a direct link to review and merge the PR.

🏗️ Architecture

graph LR
    A[Airflow] -->|Failure Event| B(PipeMedic Core)
    B -->|Query Schema| C[PostgreSQL]
    B -->|Generate Fix| D[OpenAI / LLM]
    B -->|Create PR| E[GitHub]
    B -->|Notify| F[Slack]
Loading

⚡ Getting Started

Prerequisites

Component Requirement
Language Python 3.11+
Container Docker & Docker Compose (Optional)
Orchestrator Apache Airflow (API Access)
Database PostgreSQL

🔧 Configuration

  1. Clone the repository:

    git clone https://github.com/azmatsiddique/pipemedic.git
    cd pipemedic
  2. Create configuration file:

    cp .pipemedic.yaml config.yaml

    Edit config.yaml to match your environment.

  3. Set Environment Variables:

    # Configuration
    export AIRFLOW_TOKEN="admin:admin"
    export DB_PASSWORD="your_db_password"
    export GOOGLE_API_KEY="AIza..."
    export GITHUB_TOKEN="ghp_..."
    export SLACK_WEBHOOK="https://hooks.slack.com/..."
  4. Fix Permissions (Linux/Mac): Ensure Airflow (running as user 50000) can access directories:

    mkdir -p dags logs plugins
    sudo chown -R 50000:0 dags logs plugins
    sudo chmod -R 775 dags logs plugins

🏃‍♂️ Running Locally

Install dependencies and start the agent:

pip install .
python -m pipemedic.main --interval 60

🐳 Running with Docker

Deploy instantly using Docker Compose:

docker-compose up -d --build

📂 Project Structure

  • pipemedic/core/detector.py - The Watcher: Monitors Airflow for red flags.
  • pipemedic/core/diagnoser.py - The Doctor: Analyzes DB schemas to find the root cause.
  • pipemedic/core/generator.py - The Brain: Crafts the code fix using GenAI.
  • pipemedic/core/pr_creator.py - The Builder: Packages the fix into a Pull Request.

🤝 Contributing

Contributions are welcome! Please open an issue or submit a PR for any improvements.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors