Testing

Test Philosophy

This repo favors high-signal deterministic tests over broad but shallow coverage.

The most important things to protect are:

Two regression layers exist on purpose:

the bundled synthetic evaluation suite checks coarse complete versus incomplete fixture expectations
acceptance snapshots lock representative exact outputs for evaluation and governance surfaces

After setup, prefer the Make targets. They use .venv/bin/python when the local virtualenv exists:

make reviewer-demo
make verify
make acceptance
make smoke-ui

Run the reviewer-facing docs and artifact-path regressions:

.venv/bin/python -m pytest -q test/test_reviewer_docs.py test/test_artifact_generation.py

Run the full suite:

.venv/bin/python -m pytest -q

Run the acceptance snapshots only:

.venv/bin/python -m pytest -q test/test_acceptance_snapshots.py

Run the Streamlit sanity tests only:

.venv/bin/python -m pytest -q test/test_streamlit_app.py

Run lint:

.venv/bin/python -m ruff check .

Regenerate stable artifacts:

.venv/bin/python -m scripts.generate_artifacts

Regenerate golden snapshots intentionally after a reviewed product change:

.venv/bin/python -m scripts.generate_golden_outputs

The bundled synthetic case set intentionally includes:

That is more useful here than adding a large quantity of low-value tests.

Those are out of scope for this repo.