PipeMedic is an open-source AI agent acting as a "first responder" for your data infrastructure. It automatically detects, diagnoses, and repairs data pipeline failures caused by schema drift, reducing Mean Time To Recovery (MTTR) from hours to minutes.
- 🔍 Auto-Detection: Real-time polling of Apache Airflow for failed DAGs and tasks.
- 🧠 Intelligent Diagnosis: Automatically connects to your database (PostgreSQL) to compare actual vs. expected schemas.
- 🛠️ AI-Powered Fixes: Leverages LLMs (GPT-4o) to generate precise SQL
ALTER TABLEstatements or Python code fixes. - 🤖 Automated Actions: Creates a Git branch, commits the fix, and opens a Pull Request with a descriptive summary.
- 📢 Instant Notifications: Sends rich Slack alerts with a direct link to review and merge the PR.
graph LR
A[Airflow] -->|Failure Event| B(PipeMedic Core)
B -->|Query Schema| C[PostgreSQL]
B -->|Generate Fix| D[OpenAI / LLM]
B -->|Create PR| E[GitHub]
B -->|Notify| F[Slack]
| Component | Requirement |
|---|---|
| Language | Python 3.11+ |
| Container | Docker & Docker Compose (Optional) |
| Orchestrator | Apache Airflow (API Access) |
| Database | PostgreSQL |
-
Clone the repository:
git clone https://github.com/azmatsiddique/pipemedic.git cd pipemedic -
Create configuration file:
cp .pipemedic.yaml config.yaml
Edit
config.yamlto match your environment. -
Set Environment Variables:
# Configuration export AIRFLOW_TOKEN="admin:admin" export DB_PASSWORD="your_db_password" export GOOGLE_API_KEY="AIza..." export GITHUB_TOKEN="ghp_..." export SLACK_WEBHOOK="https://hooks.slack.com/..."
-
Fix Permissions (Linux/Mac): Ensure Airflow (running as user 50000) can access directories:
mkdir -p dags logs plugins sudo chown -R 50000:0 dags logs plugins sudo chmod -R 775 dags logs plugins
Install dependencies and start the agent:
pip install .
python -m pipemedic.main --interval 60Deploy instantly using Docker Compose:
docker-compose up -d --buildpipemedic/core/detector.py- The Watcher: Monitors Airflow for red flags.pipemedic/core/diagnoser.py- The Doctor: Analyzes DB schemas to find the root cause.pipemedic/core/generator.py- The Brain: Crafts the code fix using GenAI.pipemedic/core/pr_creator.py- The Builder: Packages the fix into a Pull Request.
Contributions are welcome! Please open an issue or submit a PR for any improvements.
This project is licensed under the MIT License - see the LICENSE file for details.