DataSentinel is an agentic framework that automatically generates validation, testing, and documentation suites from API specifications. Transform any API spec into production-ready validation services with zero manual coding.
Data Scientists and Engineers spend up to 80% of their time on "data plumbing":
- π Reading API documentation
- βοΈ Writing manual data models
- π Debugging schema drift
- π§ͺ Creating test suites
- π Maintaining documentation
DataSentinel eliminates this overhead entirely.
DataSentinel uses IBM Bob as a proactive architect that:
- π Scans & Maps - Analyzes any endpoint (REST/GraphQL/JSON) and auto-generates robust data contracts
- β Guarantees Quality - Implements real-time validation with Pydantic V2
- β‘ Accelerates Development - Creates complete test suites with mock factories
- π Self-Documents - Generates technical documentation automatically
- π³ Deploys Instantly - Produces Docker-ready FastAPI services
- OpenAPI/Swagger - Deterministic parsing with full $ref resolution
- GraphQL - Introspection-based schema extraction
- JSON Samples - Intelligent type inference engine
- Pydantic V2 Models - Type-safe data models with validation
- Validators - Retry logic and schema drift detection
- Pytest Suite - Comprehensive tests with factories
- FastAPI App - Production-ready validation API
- Documentation - Markdown data dictionaries
- Docker - Multi-stage optimized containers
- β Exponential backoff retry logic
- β Multiple authentication strategies (API Key, Bearer, OAuth2, Basic)
- β Schema drift detection and alerting
- β Batch validation support
- β Async/await throughout
- β Comprehensive error handling
- β Structured logging with Loguru
| Component | Technology |
|---|---|
| Agentic Engine | IBM Bob |
| Validation | Pydantic V2 (Strict Mode) |
| Networking | HTTPX (Async) |
| API Framework | FastAPI |
| Testing | Pytest + Polyfactory |
| Logging | Loguru |
| Parsing | Prance (OpenAPI), GraphQL Introspection |
| Templating | Jinja2 |
| Code Formatting | Black |
- Python 3.9 or higher
- pip (Python package manager)
# Clone the repository
git clone https://github.com/yourusername/datasentinel.git
cd datasentinel
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpython auto_sentinel.py --version
# Output: DataSentinel 1.0.0python auto_sentinel.py \
--api https://jsonplaceholder.typicode.com/users \
--output ./my-validatorpython auto_sentinel.py \
--api ./specs/openapi.yaml \
--output ./api-validatorpython auto_sentinel.py \
--api https://api.example.com/graphql \
--format graphql \
--output ./graphql-validatorcd my-validator
pip install -r requirements.txt
pytest test_api.py -v
uvicorn app:app --reloadVisit http://localhost:8000/docs for interactive API documentation!
Comprehensive documentation is available in the docs/ directory:
| Guide | Description |
|---|---|
| Getting Started | Installation, quick start, first project |
| Input Formats | JSON, OpenAPI, GraphQL format guides |
| Generated Artifacts | Understanding generated code |
| Deployment | Local, Docker, and cloud deployment |
| API Reference | Complete API documentation |
DataSentinel follows a modular, provider-based architecture:
DataSentinel/
βββ auto_sentinel.py # CLI entry point & orchestrator
βββ config/ # Configuration management
β βββ settings.py # Pydantic settings
β βββ logging_config.py # Loguru configuration
βββ core/ # Core infrastructure
β βββ base_provider.py # Abstract base for providers
β βββ retry_handler.py # Exponential backoff
β βββ auth_manager.py # Authentication strategies
β βββ exceptions.py # Custom exceptions
βββ parsers/ # API specification parsers
β βββ json_inference_parser.py
β βββ openapi_parser.py
β βββ graphql_parser.py
βββ generators/ # Code generation engines
β βββ models_generator.py
β βββ validators_generator.py
β βββ tests_generator.py
β βββ app_generator.py
β βββ docs_generator.py
β βββ dockerfile_generator.py
βββ schemas/ # Internal data structures
β βββ api_schema.py
β βββ field_schema.py
β βββ config_schema.py
βββ templates/ # Jinja2 templates
βββ models.py.jinja2
βββ validators.py.jinja2
βββ test_api.py.jinja2
βββ app.py.jinja2
# API Key
python auto_sentinel.py \
--api https://api.example.com/data \
--auth-type api-key \
--auth-token YOUR_API_KEY
# Bearer Token
python auto_sentinel.py \
--api https://api.example.com/data \
--auth-type bearer \
--auth-token YOUR_TOKEN
# OAuth2
python auto_sentinel.py \
--api https://api.example.com/data \
--auth-type oauth2 \
--auth-token YOUR_TOKEN# Skip tests and Docker
python auto_sentinel.py \
--api ./spec.yaml \
--skip-tests \
--skip-docker
# Only generate models and validators
python auto_sentinel.py \
--api ./spec.yaml \
--skip-tests \
--skip-app \
--skip-docs \
--skip-docker# Preview what would be generated
python auto_sentinel.py \
--api ./spec.yaml \
--dry-runDataSentinel includes comprehensive test coverage:
# Run all tests
pytest tests/ -v
# Run integration tests
pytest tests/integration/ -v
# Run with coverage
pytest tests/ --cov=. --cov-report=html
# Run specific test file
pytest tests/integration/test_json_flow.py -v- β Unit tests for all modules
- β Integration tests for complete pipelines
- β Performance benchmarks
- β Error handling tests
- β 90%+ code coverage
rm -rf generated
python auto_sentinel.py --api https://rickandmortyapi.com/graphql --format graphql
cd generated
docker build -t rickmorty-validator:1.0.0 .
docker run -d -p 8765:8000 rickmorty-validator:1.0.0# Push to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin <account>.dkr.ecr.us-east-1.amazonaws.com
docker tag my-validator:latest <account>.dkr.ecr.us-east-1.amazonaws.com/my-validator:latest
docker push <account>.dkr.ecr.us-east-1.amazonaws.com/my-validator:latestgcloud run deploy my-validator \
--image gcr.io/<project>/my-validator:latest \
--platform managed \
--region us-central1 \
--allow-unauthenticatedaz container create \
--resource-group myResourceGroup \
--name my-validator \
--image <registry>.azurecr.io/my-validator:latest \
--ports 8000- β Core infrastructure with async support
- β Configuration management (Pydantic Settings)
- β Logging system (Loguru)
- β Exception hierarchy
- β Retry handler with exponential backoff
- β Authentication manager (API Key, Bearer, OAuth2, Basic)
- β Base provider pattern
- β JSON inference parser with pattern detection
- β OpenAPI parser (3.x and Swagger 2.0)
- β GraphQL introspection parser
- β Schema normalizer
- β Pydantic V2 models generator
- β Validators generator with retry and drift detection
- β Pytest test suite generator with factories
- β FastAPI app generator
- β Documentation generator
- β Dockerfile generator
- β CLI orchestrator
- β Comprehensive integration tests (45+ tests)
- β Complete documentation suite
- β Example projects
- β Production deployment guides
- Schema versioning and migration
- Multi-sample JSON inference
- GraphQL subscription support
- Webhook listener generation
- Data profiling and statistics
- Web UI for management
- CI/CD integration
- Monitoring and alerting
- Multi-tenant support
- Enterprise authentication (SAML, LDAP)
DataSentinel is designed for speed:
| Operation | Time |
|---|---|
| Parse OpenAPI spec | < 5 seconds |
| Generate all artifacts | < 10 seconds |
| Complete pipeline | < 15 seconds |
| Validation (single) | < 10ms |
| Validation (batch 100) | < 100ms |
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/ -v
# Format code
black .
# Type checking
mypy .
# Linting
flake8 .Check out the examples/ directory for:
- OpenAPI/Swagger examples
- GraphQL examples
- JSON inference examples
- Real-world API integrations
Import Errors
pip install -r requirements.txtPort Already in Use
lsof -i :8000
kill -9 <PID>Docker Build Fails
docker system prune -a
docker build --no-cache -t my-validator .See Deployment Guide for more troubleshooting tips.
This project is licensed under the MIT License - see the LICENSE file for details.
-
Andrew Rober Taylor - Lead Data Architect & AI Automation
Focus: Designing agentic data ingestion pipelines, schema validation with Pydantic, and bridging the gap between raw API data and production-ready Data Science environments. -
Vicente GarcΓa SΓ‘nchez - Reliability & Integration Engineer
Focus: Robust API consumption, error handling strategies (retries/backoff), and automated testing suites.
- IBM Bob - Agentic reasoning engine
- Pydantic - Data validation framework
- FastAPI - Modern web framework
- HTTPX - Async HTTP client
- Pytest - Testing framework
- π GitHub Issues
- π¬ Discussions
- π§ Email: support@datasentinel.dev
- π Documentation
If you find DataSentinel useful, please consider giving it a star! β
Made with β€οΈ by the DataSentinel Team
Powered by IBM Bob - Transforming API specifications into production-ready validation services