Skip to content

feat: add structured category and severity fields to review findings#29

Open
mvanhorn wants to merge 1 commit into
alibaba:mainfrom
mvanhorn:feat/16-structured-category-severity
Open

feat: add structured category and severity fields to review findings#29
mvanhorn wants to merge 1 commit into
alibaba:mainfrom
mvanhorn:feat/16-structured-category-severity

Conversation

@mvanhorn
Copy link
Copy Markdown

@mvanhorn mvanhorn commented Jun 3, 2026

Summary

Adds two optional structured fields, category and severity, to every review
finding. They flow through the model, tool-call parsing, JSON output, agent
output, and the human-readable text renderer, and are populated by the review
LLM via the code_comment tool schema and a short prompt-template instruction.

Allowed values match the issue's tables:

  • severity: critical, high, medium, low, info
  • category: bug, security, performance, maintainability, test, style, documentation, other

Why this matters

Per #16, the machine-readable output of ocr review exposes finding text,
location, and suggestion, but no structured category/severity per finding. CI
integrations (GitHub Actions, GitLab CI) currently have to re-parse
natural-language comment text to sort, group, filter, or gate builds by
importance. The maintainers asked the reporter to open this dedicated issue and
laid out the enum tables plus acceptance criteria this PR implements:

  • JSON and agent output now include category and severity per finding when
    the model provides them.
  • README and README.zh-CN document the allowed values and semantics.
  • Existing consumers that ignore the fields are unaffected: the struct fields use
    omitempty, and the tool schema does not mark them required, so the keys are
    omitted entirely when empty and older/less-capable models still emit valid tool
    calls.

The change is backward-compatible by construction (optional + omitempty + not
required).

Out of scope by design (the issue frames these as follow-ups, design questions
#3/#4): no --severity CLI filtering flags and no confidence field. This PR
lands the data first; filtering/gating can be a separate change now that the
fields exist.

Testing

  • go build ./... — success
  • go vet ./... — clean
  • go test ./... — all packages pass (198 tests)
  • New unit tests in internal/tool/code_comment_test.go:
    • category/severity are parsed when present
    • a finding without them is still accepted and leaves the fields zero-valued
    • JSON serialization includes the fields when set and omits the keys entirely
      when empty (no "category":"")

Fixes #16

AI was used for assistance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Superskyyy
Copy link
Copy Markdown

Please merge this, it is an important feature to end users.

@stay-foolish-forever
Copy link
Copy Markdown
Contributor

Thanks for the great work, @mvanhorn! 🙏

This is a really solid contribution — adding structured category and severity fields to the review findings is an important enhancement that lays the groundwork for downstream filtering, CI gate policies, and better machine-readable output.

One thing I'd like to evaluate before merging: this change introduces modifications to the review prompts. I want to carefully assess whether the additional prompt instructions for category/severity classification could have any impact on the quality or focus of the review output itself. I'll run some comparative reviews today and get back to you with my findings.

Appreciate your patience — expect an update later today!

@stay-foolish-forever
Copy link
Copy Markdown
Contributor

Thanks @mvanhorn for this well-implemented PR! The code quality is solid, and the structured category and severity fields will be valuable for downstream CI integrations.

However, after conducting careful evaluations on our benchmark suite, I've observed that introducing these changes results in a noticeable degradation in the overall review quality of the tool. The additional prompt instructions for category/severity classification appear to be affecting the focus and accuracy of the review output itself.

We're currently investigating the root cause of this regression in depth. Once we identify the underlying issue, we'll provide specific improvement suggestions — potentially around prompt engineering, model behavior, or field population strategies.

Please keep this PR open — we believe this feature is important and want to work through the quality concerns rather than close it. We'll follow up soon with concrete next steps and any necessary adjustments.

Appreciate your patience and contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(agent): add structured category and severity fields to review output

3 participants