This file provides context for Claude Code when working in this repository.
This is the Source Manager — a FastAPI application for the Police Data Accessibility Project (PDAP). It collects, enriches, and manages URLs that point to police data sources, then synchronizes validated results to the Data Sources App.
- Python 3.11+ with uv for package management
- FastAPI + uvicorn for the web framework
- SQLAlchemy (async) + asyncpg for database access
- PostgreSQL 15 as the database
- Alembic for database migrations
- Pydantic v2 for data validation
- Docker for local database and testing
- pytest + pytest-asyncio for testing
# Install dependencies
uv sync
# Start local database
cd local_database && docker compose up -d && cd ..
# Run the app locally
fastapi dev main.py
# Run automated tests (requires local database running)
uv run pytest tests/automated
# Run alembic migration tests
uv run pytest tests/alembic
# Generate a new migration
alembic revision --autogenerate -m "Description"
# Apply migrations manually
python apply_migrations.pyAll application code lives under src/:
src/api/— FastAPI routers and endpoints (15 route groups, 65 endpoints)src/core/— Integration layer:AsyncCore, task system, logger, env var managersrc/db/— Database layer: async client, SQLAlchemy models, queries, DTOssrc/collectors/— Pluggable URL collectors (Common Crawler, Auto-Googler, CKAN, MuckRock)src/external/— External service clients (HuggingFace, PDAP API, Internet Archive)src/security/— JWT auth via tokens from the Data Sources Appsrc/util/— Shared helper functions
Each endpoint group follows this layout:
src/api/endpoints/<group>/
├── routes.py # APIRouter with all routes
├── get/ post/ put/ delete/
│ ├── __init__.py # Endpoint handler
│ ├── query.py # Database query logic
│ └── dto.py # Request/response Pydantic models
└── _shared/ # Shared logic across methods
The app uses FastAPI's app.state to share core dependencies:
app.state.async_core—AsyncCoreinstance (main facade)app.state.async_scheduled_task_manager— scheduled task managerapp.state.logger—AsyncCoreLoggerinstance
All collectors inherit from AsyncCollectorBase and are registered in src/collectors/mapping.py. Each must implement run_implementation() and specify a preprocessor class.
- URL tasks enrich individual URLs (HTML scraping, agency ID, record type classification, etc.). Operators live in
src/core/tasks/url/operators/. - Scheduled tasks handle system-wide operations (sync to DS App, cleanup, HuggingFace upload, etc.). Implementations live in
src/core/tasks/scheduled/impl/.
- Automated tests (
tests/automated/) — run in CI, no third-party API calls. - Alembic tests (
tests/alembic/) — validate migration scripts. - Manual tests (
tests/manual/) — involve third-party APIs, run individually. Directory lackstestprefix intentionally. - Async mode is
auto— async test functions are detected automatically. - Test timeout is 300 seconds.
- Fixtures in
tests/conftest.pyprovideadb_clientanddb_client. - Test data helpers are in
tests/helpers/data_creator/.
See ENV.md for the full reference. Key categories:
POSTGRES_*— database connectionDS_APP_SECRET_KEY— JWT validation (must match the Data Sources App)- Various API keys (Google, HuggingFace, PDAP, Discord, etc.)
- Feature flags — all tasks can be individually toggled (set to
0to disable)
- Managed with Alembic. Migrations live in
alembic/versions/. - Models are in
src/db/models/impl/organized by entity. - The primary interface is
AsyncDatabaseClientinsrc/db/client/async_.py. - Local database uses Docker:
local_database/docker-compose.yml.
- The app exposes its API docs at
/api(not the default/docs—/docsredirects to/api). - CORS is configured for
localhost:8888,pdap.io, andpdap.dev. - Two permission levels:
access_source_collector(general) andsource_collector_final_review(final review). - The app synchronizes agencies, data sources, and meta URLs to the Data Sources App via nine scheduled sync tasks.