Skip to content

Latest commit

 

History

History
794 lines (592 loc) · 17.9 KB

File metadata and controls

794 lines (592 loc) · 17.9 KB

Generated Artifacts Guide

This guide explains all the files generated by DataSentinel and how to use them.

Table of Contents

  1. Overview
  2. models.py
  3. validators.py
  4. test_api.py
  5. app.py
  6. data_dict.md
  7. Dockerfile
  8. Customization

Overview

DataSentinel generates a complete, production-ready validation service with the following artifacts:

generated/
├── models.py          # Pydantic v2 data models
├── validators.py      # Validation logic with retry
├── test_api.py        # Pytest test suite
├── app.py             # FastAPI application
├── data_dict.md       # Documentation
├── Dockerfile         # Docker configuration
└── .dockerignore      # Docker ignore file

Each file is:

  • Production-ready - No manual editing required
  • Well-documented - Comprehensive docstrings
  • Type-safe - Full type hints
  • Tested - Includes test suite
  • Customizable - Easy to extend

models.py

Purpose

Defines Pydantic v2 models for data validation and serialization.

Structure

"""
Generated Pydantic models for API validation.

This module contains data models with comprehensive validation rules.
"""

from datetime import datetime, date
from typing import List, Optional, Dict, Any
from pydantic import BaseModel, Field, EmailStr, HttpUrl, UUID4, validator
from decimal import Decimal


class User(BaseModel):
    """
    User model.
    
    Represents a user in the system.
    """
    
    id: int = Field(..., description="User ID", ge=1)
    name: str = Field(..., description="User's full name", min_length=1, max_length=100)
    email: EmailStr = Field(..., description="User's email address")
    age: Optional[int] = Field(None, description="User's age", ge=0, le=150)
    is_active: bool = Field(True, description="Whether the user is active")
    created_at: datetime = Field(..., description="Account creation timestamp")
    
    class Config:
        json_schema_extra = {
            "example": {
                "id": 1,
                "name": "John Doe",
                "email": "john@example.com",
                "age": 30,
                "is_active": True,
                "created_at": "2024-01-15T10:30:00Z"
            }
        }
    
    @validator('name')
    def validate_name(cls, v):
        """Validate name is not empty after stripping."""
        if not v.strip():
            raise ValueError('Name cannot be empty')
        return v.strip()

Features

Field Validation

  • Type checking - Automatic type validation
  • Constraints - Min/max length, range validation
  • Formats - Email, URL, UUID, datetime validation
  • Custom validators - Business logic validation

Nested Models

class Profile(BaseModel):
    bio: str
    location: str
    website: Optional[HttpUrl] = None

class User(BaseModel):
    id: int
    name: str
    profile: Profile  # Nested model

Arrays

class User(BaseModel):
    id: int
    tags: List[str] = Field(default_factory=list)
    friends: List['User'] = []  # Self-referencing

Optional Fields

class User(BaseModel):
    id: int
    name: str
    age: Optional[int] = None  # Optional field
    bio: str = "No bio"  # Default value

Usage

from models import User

# Create instance
user = User(
    id=1,
    name="John Doe",
    email="john@example.com",
    age=30,
    is_active=True,
    created_at="2024-01-15T10:30:00Z"
)

# Validate data
try:
    user = User(**data)
except ValidationError as e:
    print(e.errors())

# Serialize to JSON
json_str = user.model_dump_json()

# Deserialize from JSON
user = User.model_validate_json(json_str)

validators.py

Purpose

Provides validation logic with retry mechanisms and schema drift detection.

Structure

"""
Generated validators for API validation.

This module provides validation logic with retry and drift detection.
"""

import asyncio
from typing import Dict, List, Any, Optional
from datetime import datetime
import httpx
from pydantic import ValidationError

from models import User
from core.retry_handler import retry_with_backoff
from core.exceptions import ValidationFailedError


class ValidationResult(BaseModel):
    """Result of a validation operation."""
    
    success: bool
    data: Optional[Dict[str, Any]] = None
    errors: List[str] = Field(default_factory=list)
    warnings: List[str] = Field(default_factory=list)
    timestamp: datetime = Field(default_factory=datetime.now)
    drift_detected: bool = False


class UserValidator:
    """
    Validator for User model.
    
    Provides validation with retry logic and drift detection.
    """
    
    def __init__(self, base_url: str, auth_token: Optional[str] = None):
        """
        Initialize validator.
        
        Args:
            base_url: Base URL for API
            auth_token: Optional authentication token
        """
        self.base_url = base_url
        self.auth_token = auth_token
        self.client = httpx.AsyncClient()
    
    @retry_with_backoff(max_retries=3)
    async def validate_user(self, data: Dict[str, Any]) -> ValidationResult:
        """
        Validate user data.
        
        Args:
            data: User data to validate
            
        Returns:
            ValidationResult with success status and errors
        """
        result = ValidationResult()
        
        try:
            # Validate against model
            user = User(**data)
            result.success = True
            result.data = user.model_dump()
            
        except ValidationError as e:
            result.success = False
            result.errors = [str(err) for err in e.errors()]
        
        return result
    
    async def validate_batch(
        self, 
        data_list: List[Dict[str, Any]]
    ) -> List[ValidationResult]:
        """
        Validate multiple records.
        
        Args:
            data_list: List of user data to validate
            
        Returns:
            List of ValidationResult objects
        """
        tasks = [self.validate_user(data) for data in data_list]
        return await asyncio.gather(*tasks)
    
    async def detect_drift(
        self, 
        endpoint: str
    ) -> ValidationResult:
        """
        Detect schema drift by comparing live API response.
        
        Args:
            endpoint: API endpoint to check
            
        Returns:
            ValidationResult with drift detection info
        """
        result = ValidationResult()
        
        try:
            # Fetch live data
            headers = {}
            if self.auth_token:
                headers["Authorization"] = f"Bearer {self.auth_token}"
            
            response = await self.client.get(
                f"{self.base_url}{endpoint}",
                headers=headers
            )
            response.raise_for_status()
            
            live_data = response.json()
            
            # Validate against current model
            try:
                User(**live_data)
                result.success = True
            except ValidationError as e:
                result.success = False
                result.drift_detected = True
                result.warnings.append("Schema drift detected!")
                result.errors = [str(err) for err in e.errors()]
        
        except Exception as e:
            result.success = False
            result.errors.append(f"Drift detection failed: {str(e)}")
        
        return result

Features

Retry Logic

  • Automatic retry on transient failures
  • Exponential backoff with jitter
  • Configurable max retries

Batch Validation

  • Validate multiple records efficiently
  • Parallel processing with asyncio
  • Aggregated results

Schema Drift Detection

  • Compare live API responses
  • Detect schema changes
  • Alert on mismatches

Usage

from validators import UserValidator

# Initialize validator
validator = UserValidator(
    base_url="https://api.example.com",
    auth_token="your-token"
)

# Validate single record
result = await validator.validate_user({
    "id": 1,
    "name": "John Doe",
    "email": "john@example.com"
})

if result.success:
    print("Validation passed!")
else:
    print(f"Errors: {result.errors}")

# Batch validation
results = await validator.validate_batch([
    {"id": 1, "name": "John", "email": "john@example.com"},
    {"id": 2, "name": "Jane", "email": "jane@example.com"}
])

# Drift detection
drift_result = await validator.detect_drift("/users/1")
if drift_result.drift_detected:
    print("Warning: API schema has changed!")

test_api.py

Purpose

Comprehensive pytest test suite for generated models and validators.

Structure

"""
Generated test suite for API validation.

This module contains comprehensive tests for models and validators.
"""

import pytest
from polyfactory.factories.pydantic_factory import ModelFactory

from models import User
from validators import UserValidator


class UserFactory(ModelFactory[User]):
    """Factory for generating test User instances."""
    __model__ = User


class TestUserModel:
    """Tests for User model."""
    
    def test_user_creation_valid(self):
        """Test creating a valid user."""
        user = User(
            id=1,
            name="John Doe",
            email="john@example.com",
            age=30,
            is_active=True,
            created_at="2024-01-15T10:30:00Z"
        )
        assert user.id == 1
        assert user.name == "John Doe"
    
    def test_user_creation_invalid_email(self):
        """Test validation fails with invalid email."""
        with pytest.raises(ValidationError):
            User(
                id=1,
                name="John Doe",
                email="invalid-email",
                created_at="2024-01-15T10:30:00Z"
            )
    
    def test_user_factory(self):
        """Test factory generates valid users."""
        user = UserFactory.build()
        assert isinstance(user, User)
        assert user.id > 0


class TestUserValidator:
    """Tests for User validator."""
    
    @pytest.mark.asyncio
    async def test_validate_user_success(self):
        """Test successful validation."""
        validator = UserValidator("https://api.example.com")
        result = await validator.validate_user({
            "id": 1,
            "name": "John Doe",
            "email": "john@example.com",
            "created_at": "2024-01-15T10:30:00Z"
        })
        assert result.success is True
    
    @pytest.mark.asyncio
    async def test_validate_user_failure(self):
        """Test validation failure."""
        validator = UserValidator("https://api.example.com")
        result = await validator.validate_user({
            "id": "invalid",  # Should be int
            "name": "John Doe"
        })
        assert result.success is False
        assert len(result.errors) > 0

Features

Test Coverage

  • ✅ Model creation tests
  • ✅ Validation tests (success and failure)
  • ✅ Edge case tests
  • ✅ Factory-based test data generation

Test Fixtures

  • Polyfactory integration for data generation
  • Reusable fixtures
  • Mock HTTP clients

Usage

# Run all tests
pytest test_api.py -v

# Run specific test class
pytest test_api.py::TestUserModel -v

# Run with coverage
pytest test_api.py --cov=. --cov-report=html

# Run in parallel
pytest test_api.py -n auto

app.py

Purpose

FastAPI application providing validation endpoints.

Structure

"""
Generated FastAPI application for API validation.

This module provides REST endpoints for data validation.
"""

from fastapi import FastAPI, HTTPException, Depends
from fastapi.middleware.cors import CORSMiddleware
from typing import List, Dict, Any

from models import User
from validators import UserValidator, ValidationResult


app = FastAPI(
    title="User Validation API",
    description="Automated validation service for User data",
    version="1.0.0"
)

# CORS configuration
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)


@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {"status": "healthy", "service": "user-validator"}


@app.post("/validate", response_model=ValidationResult)
async def validate_user(data: Dict[str, Any]):
    """
    Validate user data.
    
    Args:
        data: User data to validate
        
    Returns:
        ValidationResult with success status
    """
    validator = UserValidator("https://api.example.com")
    return await validator.validate_user(data)


@app.post("/validate/batch", response_model=List[ValidationResult])
async def validate_batch(data_list: List[Dict[str, Any]]):
    """
    Validate multiple user records.
    
    Args:
        data_list: List of user data to validate
        
    Returns:
        List of ValidationResult objects
    """
    validator = UserValidator("https://api.example.com")
    return await validator.validate_batch(data_list)


@app.get("/drift/{endpoint:path}")
async def check_drift(endpoint: str):
    """
    Check for schema drift.
    
    Args:
        endpoint: API endpoint to check
        
    Returns:
        Drift detection result
    """
    validator = UserValidator("https://api.example.com")
    return await validator.detect_drift(f"/{endpoint}")


if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Endpoints

Method Path Description
GET /health Health check
GET /docs Interactive API docs (Swagger UI)
GET /redoc Alternative API docs (ReDoc)
POST /validate Validate single record
POST /validate/batch Validate multiple records
GET /drift/{endpoint} Check for schema drift

Usage

# Start the server
uvicorn app:app --reload

# Or use Python directly
python app.py

# Access interactive docs
open http://localhost:8000/docs

# Make requests
curl -X POST http://localhost:8000/validate \
  -H "Content-Type: application/json" \
  -d '{"id": 1, "name": "John", "email": "john@example.com"}'

data_dict.md

Purpose

Comprehensive data dictionary documenting all models and fields.

Structure

# Data Dictionary

## User

**Description:** User model representing a user in the system.

### Fields

| Field | Type | Required | Constraints | Description |
|-------|------|----------|-------------|-------------|
| id | integer | Yes | >= 1 | User ID |
| name | string | Yes | 1-100 chars | User's full name |
| email | string (email) | Yes | Valid email | User's email address |
| age | integer | No | 0-150 | User's age |
| is_active | boolean | No | Default: true | Whether the user is active |
| created_at | datetime | Yes | ISO 8601 | Account creation timestamp |

### Example

\`\`\`json
{
  "id": 1,
  "name": "John Doe",
  "email": "john@example.com",
  "age": 30,
  "is_active": true,
  "created_at": "2024-01-15T10:30:00Z"
}
\`\`\`

### Validation Rules

- Name must not be empty after stripping whitespace
- Email must be a valid email address
- Age must be between 0 and 150 if provided

Usage

  • Reference for developers
  • API documentation
  • Integration guide
  • Training material

Dockerfile

Purpose

Multi-stage Docker configuration for containerized deployment.

Structure

# Multi-stage build for optimized image size
FROM python:3.11-slim as builder

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

# Final stage
FROM python:3.11-slim

WORKDIR /app

# Copy dependencies from builder
COPY --from=builder /root/.local /root/.local

# Copy application code
COPY models.py validators.py app.py ./

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD python -c "import httpx; httpx.get('http://localhost:8000/health')"

# Run application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Features

  • ✅ Multi-stage build for smaller images
  • ✅ Health check configuration
  • ✅ Non-root user (security)
  • ✅ Optimized layer caching

Usage

# Build image
docker build -t user-validator .

# Run container
docker run -p 8000:8000 user-validator

# Run with environment variables
docker run -p 8000:8000 \
  -e API_TOKEN=your-token \
  user-validator

# Docker Compose
docker-compose up

Customization

Extending Models

Add custom fields or validators:

# In models.py
class User(BaseModel):
    # ... existing fields ...
    
    # Add custom field
    custom_field: str = "default"
    
    # Add custom validator
    @validator('custom_field')
    def validate_custom(cls, v):
        # Your validation logic
        return v

Extending Validators

Add custom validation logic:

# In validators.py
class UserValidator:
    # ... existing methods ...
    
    async def custom_validation(self, data: Dict[str, Any]) -> bool:
        """Your custom validation logic."""
        # Implement custom checks
        return True

Extending API

Add custom endpoints:

# In app.py
@app.post("/custom-endpoint")
async def custom_endpoint(data: Dict[str, Any]):
    """Your custom endpoint."""
    # Implement custom logic
    return {"status": "success"}

Next Steps


Need Help?