Canonicalize review-verdict.json to the loose shape (F-2)#34
Canonicalize review-verdict.json to the loose shape (F-2)#34alanshurafa wants to merge 1 commit into
Conversation
Three copies of review-verdict.json had drifted into two shapes: schemas/ and runners/codex-ps/schemas/ carried a strict variant (per-issue file/line_range/suggestion required, additionalProperties:false, two extra top-level required keys), while skills/dev-review/schemas/ kept the original loose shape. The loose shape is the real contract. validate_review_verdict (lib/ co-evolution.sh) reads none of the files and enforces only verdict/confidence/summary/issues with severity+description per issue, and the Bash runtime hands codex --output-schema the loose skill copy. The PowerShell runner never validates against the schema (no Test-Json/-Schema anywhere) and reads only loose fields behind truthiness guards; one of its own fixtures already violates the strict schema. Going strict would have tightened the validator and started rejecting reasonable verdicts. Relax the two strict copies to the loose skill shape so all three are structurally identical. Add tests/review-verdict-schema-simulation.sh: a jq -S drift guard across the three copies plus positive/negative validator fixtures (none existed for the Bash validator before). Verified 5/5; scorer-verification still 14/14. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f586231534
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "description": "Structured code review verdict for the dev-review loop", | ||
| "type": "object", | ||
| "required": ["verdict", "confidence", "summary", "issues", "scope_creep_detected", "iteration_notes"], | ||
| "required": ["verdict", "confidence", "summary", "issues"], |
There was a problem hiding this comment.
Keep the Codex-facing schema strict
When the PowerShell runner reaches verify, it builds $reviewSchemaPath from schemas/review-verdict.json and passes it to codex exec --output-schema (runners/codex-ps/scripts/run-co-evolution.ps1). OpenAI Structured Outputs require every object property to be listed in required and each object to set additionalProperties: false; relaxing these top-level fields (and the issue fields below) makes this schema invalid for the Codex-backed verify path, so runs can fail before a verdict is produced rather than merely accepting a looser verdict.
Useful? React with 👍 / 👎.
What this changes
Three copies of
review-verdict.jsonhad drifted into two shapes.schemas/andrunners/codex-ps/schemas/carried a strict variant (per-issuefile/line_range/suggestionrequired,additionalProperties: false, two extra top-level required keys);skills/dev-review/schemas/kept the original loose shape. This relaxes the two strict copies to the loose shape so all three are structurally identical.Why loose, not strict
The loose shape is the contract that is actually enforced and consumed:
validate_review_verdict(lib/co-evolution.sh) reads none of the files and enforces onlyverdict/confidence/summary/issues, withseverity+descriptionper issue.codex --output-schemathe loose skill copy.Test-Json/-Schemaanywhere) and reads only loose fields behind truthiness guards. One of its own fixtures already violates the strict schema ("line"instead of"line_range", lowercase severity), so strict was unenforced.Going strict would have meant tightening the validator and rejecting reasonable verdicts (a missing line range, or a global issue with no single file).
Verification
New
tests/review-verdict-schema-simulation.sh: ajq -Sdrift guard across the three copies plus positive/negative validator fixtures (none existed for the Bash validator before). 5/5, andscorer-verificationstill 14/14.Audit finding F-2, split out of Phase 1 (#32) because it was a design call with cross-runner risk, not a mechanical dedup.
🤖 Generated with Claude Code