Skip to content

test: add same-evidence reader quality eval case#341

Merged
Stahl-G merged 1 commit into
mainfrom
codex/same-evidence-regression-pack
Jul 2, 2026
Merged

test: add same-evidence reader quality eval case#341
Stahl-G merged 1 commit into
mainfrom
codex/same-evidence-regression-pack

Conversation

@Stahl-G

@Stahl-G Stahl-G commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

  • add packaged public-safe eval case same_evidence_reader_quality_regression
  • add eval runner action quality.summarize to write Quality Panel JSON, summary, and HTML for fixture validation
  • document the same-evidence regression boundary in evaluation-case docs and changelog

Boundary

  • deterministic regression guard only
  • no subagent execution, source fetching, model-quality score, semantic proof, delivery approval, or release authority
  • fixture uses synthetic ExampleCo evidence and reader-safe output

Validation

  • python3 -m pytest -q tests/test_evaluation_cases.py
  • PYTHONPATH=src python3 -m multi_agent_brief.cli.main eval-cases validate --json
  • PYTHONPATH=src python3 -m multi_agent_brief.cli.main eval-cases run --case-id same_evidence_reader_quality_regression --repo-workdir . --json
  • python3 -m pytest -q tests/test_evaluation_cases.py tests/test_quality_panel.py tests/test_materiality_selection.py tests/test_support_wording.py tests/test_status.py
  • python3 scripts/check_release_consistency.py --no-tag
  • python3 scripts/check_product_baseline.py
  • python3 scripts/check_briefloop_skill_freshness.py
  • python3 scripts/check_skill_contract.py
  • python3 scripts/check_version_consistency.py
  • PYTHONPATH=src python3 scripts/check_capabilities.py
  • python3 scripts/check_runtime_asset_parity.py
  • python3 scripts/generate_agent_configs.py --check
  • python3 scripts/sync_hermes_plugin_skills.py --check
  • git diff --check
  • python3 -m compileall -q src tests
  • python3 -m pytest -q

@Stahl-G Stahl-G force-pushed the codex/same-evidence-regression-pack branch from 01bb1d2 to 19ca4a0 Compare July 2, 2026 04:51
@Stahl-G Stahl-G marked this pull request as ready for review July 2, 2026 04:59
@Stahl-G Stahl-G merged commit 8b4002b into main Jul 2, 2026
13 checks passed
@Stahl-G Stahl-G deleted the codex/same-evidence-regression-pack branch July 2, 2026 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant