Skip to content

andtay/DataSentinel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ DataSentinel: Agentic API Validation Framework

Python 3.9+ License: MIT Code style: black

DataSentinel is an agentic framework that automatically generates validation, testing, and documentation suites from API specifications. Transform any API spec into production-ready validation services with zero manual coding.


πŸš€ The Problem

Data Scientists and Engineers spend up to 80% of their time on "data plumbing":

  • πŸ“– Reading API documentation
  • ✍️ Writing manual data models
  • πŸ› Debugging schema drift
  • πŸ§ͺ Creating test suites
  • πŸ“ Maintaining documentation

DataSentinel eliminates this overhead entirely.


🧠 The Agentic Solution

DataSentinel uses IBM Bob as a proactive architect that:

  1. πŸ” Scans & Maps - Analyzes any endpoint (REST/GraphQL/JSON) and auto-generates robust data contracts
  2. βœ… Guarantees Quality - Implements real-time validation with Pydantic V2
  3. ⚑ Accelerates Development - Creates complete test suites with mock factories
  4. πŸ“š Self-Documents - Generates technical documentation automatically
  5. 🐳 Deploys Instantly - Produces Docker-ready FastAPI services

✨ Key Features

Three Input Formats

  • OpenAPI/Swagger - Deterministic parsing with full $ref resolution
  • GraphQL - Introspection-based schema extraction
  • JSON Samples - Intelligent type inference engine

Generated Artifacts

  • Pydantic V2 Models - Type-safe data models with validation
  • Validators - Retry logic and schema drift detection
  • Pytest Suite - Comprehensive tests with factories
  • FastAPI App - Production-ready validation API
  • Documentation - Markdown data dictionaries
  • Docker - Multi-stage optimized containers

Production Features

  • βœ… Exponential backoff retry logic
  • βœ… Multiple authentication strategies (API Key, Bearer, OAuth2, Basic)
  • βœ… Schema drift detection and alerting
  • βœ… Batch validation support
  • βœ… Async/await throughout
  • βœ… Comprehensive error handling
  • βœ… Structured logging with Loguru

πŸ› οΈ Tech Stack

Component Technology
Agentic Engine IBM Bob
Validation Pydantic V2 (Strict Mode)
Networking HTTPX (Async)
API Framework FastAPI
Testing Pytest + Polyfactory
Logging Loguru
Parsing Prance (OpenAPI), GraphQL Introspection
Templating Jinja2
Code Formatting Black

πŸ“¦ Installation

Prerequisites

  • Python 3.9 or higher
  • pip (Python package manager)

Quick Install

# Clone the repository
git clone https://github.com/yourusername/datasentinel.git
cd datasentinel

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Verify Installation

python auto_sentinel.py --version
# Output: DataSentinel 1.0.0

πŸš€ Quick Start

Generate from JSON Endpoint

python auto_sentinel.py \
  --api https://jsonplaceholder.typicode.com/users \
  --output ./my-validator

Generate from OpenAPI Spec

python auto_sentinel.py \
  --api ./specs/openapi.yaml \
  --output ./api-validator

Generate from GraphQL

python auto_sentinel.py \
  --api https://api.example.com/graphql \
  --format graphql \
  --output ./graphql-validator

Run Generated Service

cd my-validator
pip install -r requirements.txt
pytest test_api.py -v
uvicorn app:app --reload

Visit http://localhost:8000/docs for interactive API documentation!


πŸ“– Documentation

Comprehensive documentation is available in the docs/ directory:

Guide Description
Getting Started Installation, quick start, first project
Input Formats JSON, OpenAPI, GraphQL format guides
Generated Artifacts Understanding generated code
Deployment Local, Docker, and cloud deployment
API Reference Complete API documentation

πŸ—οΈ Architecture

DataSentinel follows a modular, provider-based architecture:

DataSentinel/
β”œβ”€β”€ auto_sentinel.py       # CLI entry point & orchestrator
β”œβ”€β”€ config/                # Configuration management
β”‚   β”œβ”€β”€ settings.py        # Pydantic settings
β”‚   └── logging_config.py  # Loguru configuration
β”œβ”€β”€ core/                  # Core infrastructure
β”‚   β”œβ”€β”€ base_provider.py   # Abstract base for providers
β”‚   β”œβ”€β”€ retry_handler.py   # Exponential backoff
β”‚   β”œβ”€β”€ auth_manager.py    # Authentication strategies
β”‚   └── exceptions.py      # Custom exceptions
β”œβ”€β”€ parsers/               # API specification parsers
β”‚   β”œβ”€β”€ json_inference_parser.py
β”‚   β”œβ”€β”€ openapi_parser.py
β”‚   └── graphql_parser.py
β”œβ”€β”€ generators/            # Code generation engines
β”‚   β”œβ”€β”€ models_generator.py
β”‚   β”œβ”€β”€ validators_generator.py
β”‚   β”œβ”€β”€ tests_generator.py
β”‚   β”œβ”€β”€ app_generator.py
β”‚   β”œβ”€β”€ docs_generator.py
β”‚   └── dockerfile_generator.py
β”œβ”€β”€ schemas/               # Internal data structures
β”‚   β”œβ”€β”€ api_schema.py
β”‚   β”œβ”€β”€ field_schema.py
β”‚   └── config_schema.py
└── templates/             # Jinja2 templates
    β”œβ”€β”€ models.py.jinja2
    β”œβ”€β”€ validators.py.jinja2
    β”œβ”€β”€ test_api.py.jinja2
    └── app.py.jinja2

πŸ’‘ Usage Examples

With Authentication

# API Key
python auto_sentinel.py \
  --api https://api.example.com/data \
  --auth-type api-key \
  --auth-token YOUR_API_KEY

# Bearer Token
python auto_sentinel.py \
  --api https://api.example.com/data \
  --auth-type bearer \
  --auth-token YOUR_TOKEN

# OAuth2
python auto_sentinel.py \
  --api https://api.example.com/data \
  --auth-type oauth2 \
  --auth-token YOUR_TOKEN

Selective Generation

# Skip tests and Docker
python auto_sentinel.py \
  --api ./spec.yaml \
  --skip-tests \
  --skip-docker

# Only generate models and validators
python auto_sentinel.py \
  --api ./spec.yaml \
  --skip-tests \
  --skip-app \
  --skip-docs \
  --skip-docker

Dry Run Mode

# Preview what would be generated
python auto_sentinel.py \
  --api ./spec.yaml \
  --dry-run

πŸ§ͺ Testing

DataSentinel includes comprehensive test coverage:

# Run all tests
pytest tests/ -v

# Run integration tests
pytest tests/integration/ -v

# Run with coverage
pytest tests/ --cov=. --cov-report=html

# Run specific test file
pytest tests/integration/test_json_flow.py -v

Test Coverage

  • βœ… Unit tests for all modules
  • βœ… Integration tests for complete pipelines
  • βœ… Performance benchmarks
  • βœ… Error handling tests
  • βœ… 90%+ code coverage

🐳 Docker Deployment

Build and Run

rm -rf generated
python auto_sentinel.py --api https://rickandmortyapi.com/graphql --format graphql
cd generated
docker build -t rickmorty-validator:1.0.0 .
docker run -d -p 8765:8000 rickmorty-validator:1.0.0

☁️ Cloud Deployment

AWS ECS

# Push to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin <account>.dkr.ecr.us-east-1.amazonaws.com
docker tag my-validator:latest <account>.dkr.ecr.us-east-1.amazonaws.com/my-validator:latest
docker push <account>.dkr.ecr.us-east-1.amazonaws.com/my-validator:latest

Google Cloud Run

gcloud run deploy my-validator \
  --image gcr.io/<project>/my-validator:latest \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated

Azure Container Instances

az container create \
  --resource-group myResourceGroup \
  --name my-validator \
  --image <registry>.azurecr.io/my-validator:latest \
  --ports 8000

πŸ—ΊοΈ Project Status

βœ… Completed Features

Phase 1: Foundation

  • βœ… Core infrastructure with async support
  • βœ… Configuration management (Pydantic Settings)
  • βœ… Logging system (Loguru)
  • βœ… Exception hierarchy
  • βœ… Retry handler with exponential backoff
  • βœ… Authentication manager (API Key, Bearer, OAuth2, Basic)
  • βœ… Base provider pattern

Phase 2: Parsers

  • βœ… JSON inference parser with pattern detection
  • βœ… OpenAPI parser (3.x and Swagger 2.0)
  • βœ… GraphQL introspection parser
  • βœ… Schema normalizer

Phase 3: Generators

  • βœ… Pydantic V2 models generator
  • βœ… Validators generator with retry and drift detection
  • βœ… Pytest test suite generator with factories
  • βœ… FastAPI app generator
  • βœ… Documentation generator
  • βœ… Dockerfile generator

Phase 4: Integration & Documentation

  • βœ… CLI orchestrator
  • βœ… Comprehensive integration tests (45+ tests)
  • βœ… Complete documentation suite
  • βœ… Example projects
  • βœ… Production deployment guides

🚧 Future Enhancements

Phase 5: Advanced Features

  • Schema versioning and migration
  • Multi-sample JSON inference
  • GraphQL subscription support
  • Webhook listener generation
  • Data profiling and statistics

Phase 6: Enterprise Features

  • Web UI for management
  • CI/CD integration
  • Monitoring and alerting
  • Multi-tenant support
  • Enterprise authentication (SAML, LDAP)

πŸ“Š Performance

DataSentinel is designed for speed:

Operation Time
Parse OpenAPI spec < 5 seconds
Generate all artifacts < 10 seconds
Complete pipeline < 15 seconds
Validation (single) < 10ms
Validation (batch 100) < 100ms

🀝 Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/ -v

# Format code
black .

# Type checking
mypy .

# Linting
flake8 .

πŸ“ Examples

Check out the examples/ directory for:

  • OpenAPI/Swagger examples
  • GraphQL examples
  • JSON inference examples
  • Real-world API integrations

πŸ› Troubleshooting

Common Issues

Import Errors

pip install -r requirements.txt

Port Already in Use

lsof -i :8000
kill -9 <PID>

Docker Build Fails

docker system prune -a
docker build --no-cache -t my-validator .

See Deployment Guide for more troubleshooting tips.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ‘₯ Team

  • Andrew Rober Taylor - Lead Data Architect & AI Automation
    Focus: Designing agentic data ingestion pipelines, schema validation with Pydantic, and bridging the gap between raw API data and production-ready Data Science environments.

  • Vicente GarcΓ­a SΓ‘nchez - Reliability & Integration Engineer
    Focus: Robust API consumption, error handling strategies (retries/backoff), and automated testing suites.


πŸ™ Acknowledgments

  • IBM Bob - Agentic reasoning engine
  • Pydantic - Data validation framework
  • FastAPI - Modern web framework
  • HTTPX - Async HTTP client
  • Pytest - Testing framework

πŸ“ž Support


⭐ Star History

If you find DataSentinel useful, please consider giving it a star! ⭐


Made with ❀️ by the DataSentinel Team

Powered by IBM Bob - Transforming API specifications into production-ready validation services

About

Agentic framework powered by IBM Bob to automate API ingestion, Pydantic validation, and Mock generation. Designed for Data Scientists to eliminate manual boilerplate, handle schema drift, and bridge the gap between software engineering and data pipelines. Protocol-agnostic (REST/GraphQL) with built-in FastAPI documentation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors