feat: Unified PR review engine (ckb review)#137
Conversation
Implements comprehensive PR review with parallel quality gates: - Engine core (review.go): orchestrates breaking, secrets, tests, complexity, coupling, hotspots, risk, and critical-path checks - CLI command (cmd/ckb/review.go): human, markdown, github-actions formats - MCP tool (reviewPR): full InputSchema, added to PresetReview - HTTP API (POST /review/pr): GET/POST with policy overrides - Config section (ReviewConfig): repo-level policy defaults - Complexity delta (review_complexity.go): tree-sitter before/after comparison - Coupling gaps (review_coupling.go): co-change analysis for missing files - 15 tests covering integration (real git repos) and unit tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR split suggestion via connected component analysis on module affinity + coupling graph. Change classification (new/refactor/ moved/churn/config/test/generated) with review priority. Review effort estimation based on LOC, file switches, module context switches, and critical file overhead. Per-cluster reviewer assignment from ownership data. New files: - review_split.go: BFS-based clustering, coupling edge enrichment - review_classify.go: 8 categories with confidence + priority - review_effort.go: time estimation with complexity tiers - review_reviewers.go: per-cluster reviewer scoping Wired into ReviewPR response (SplitSuggestion, ChangeBreakdown, ReviewEffort, ClusterReviewers). CLI formatters updated for human and markdown output. 16 new tests, 31 total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… 4-7 Batch 4 — Code Health & Baseline: - 8-factor weighted health score (cyclomatic, cognitive, LOC, churn, coupling, bus factor, age, coverage) - Per-file health deltas with A-F grading, wired as parallel check - Finding baselines: save/load/list/compare with SHA256 fingerprinting - CLI: ckb review baseline save/list/diff Batch 5 — Industrial/Compliance: - Traceability check: configurable regex patterns for ticket IDs in commits/branches - Reviewer independence enforcement: author exclusion, critical-path escalation - Compliance evidence export format (--format=compliance) - Git adapter: GetCommitRange() for commit-range queries Batch 6 — CI/CD & Output Formats: - SARIF v2.1.0 output with partialFingerprints, fixes, rules - CodeClimate JSON output for GitLab Code Quality - GitHub Action (action/ckb-review/action.yml) with PR comments and SARIF upload - GitLab CI template (ci/gitlab-ckb-review.yml) with code quality job Batch 7 — Tests & Golden Files: - 6 golden-file tests for all output formats (human, markdown, sarif, codeclimate, github-actions, json) - 19 format unit tests (SARIF, CodeClimate, GitHub Actions, human, markdown, compliance) - 16 health/baseline tests, 10 traceability/independence tests - Fixed map iteration order in formatters for deterministic output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds dedicated review-tests job that runs: - Review engine unit/integration tests (82 tests across batches 1-7) - Format output tests (SARIF, CodeClimate, GitHub Actions, compliance) - Golden-file tests with staleness check for testdata/review/ Build job now gates on review-tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| // getBaseComplexity gets complexity of a file at a given git ref. | ||
| func getBaseComplexity(ctx context.Context, analyzer *complexity.Analyzer, repoRoot, file, ref string) *complexity.FileComplexity { | ||
| // Use git show to get the base version content | ||
| cmd := exec.CommandContext(ctx, "git", "show", ref+":"+file) |
Check failure
Code scanning / gosec
Subprocess launched with variable Error
🔐 Security Audit Results
🛡️ SAST AnalysisFound 17 issue(s) across 1 scanner(s) DetailsGosec (17 findings)
📦 Dependency VulnerabilitiesFound 16 vulnerability(ies) across 2 scanner(s) DetailsTrivy (10 findings)
OSV-Scanner (6 findings)
📜 License IssuesFound 69 non-permissive license(s) Details
Generated by CKB Security Audit | View Details | Security Tab |
🟡 Change Impact Analysis
Blast Radius: 0 modules, 1 files, 328 unique callers 📝 Changed Symbols (956)
🎯 Affected Downstream (20)
Recommendations
Generated by CKB |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #137 +/- ##
=========================================
+ Coverage 45.6% 47.8% +2.1%
=========================================
Files 367 389 +22
Lines 61934 65334 +3400
=========================================
+ Hits 28274 31236 +2962
- Misses 31744 31945 +201
- Partials 1916 2153 +237
Flags with carried forward coverage won't be shown. Click here to find out more. 📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
CKB Analysis
Risk factors: Large PR with 72 files • High churn: 11997 lines changed • Touches 50 hotspot(s) 👥 Suggested: @lisa.welsch1985@gmail.com (14%), @talantyyr@gmail.com (14%)
🎯 Change Impact Analysis · 🟡 MEDIUM · 948 changed → 20 affected
Symbols changed in this PR:
Downstream symbols affected:
Recommendations:
💣 Blast radius · 0 symbols · 10 tests · 0 consumersTests that may break:
🔥 Hotspots · 50 volatile files📦 Modules · 4 at risk
📊 Complexity · 11 violations
💡 Quick wins · 10 suggestions
📚 Stale docs · 148 broken references
Generated by CKB · Run details |
- Serialize complexity/health/hotspots/risk checks into single goroutine to prevent go-tree-sitter cgo SIGABRT from concurrent parser use - Fix SARIF v2.1.0: use RelatedLocations for suggestions instead of non-compliant empty Fixes (requires artifactChanges) - Add path traversal prevention on baseline tags (regex validation) - Fix matchGlob silent truncation for patterns with 3+ ** wildcards - Add GHA annotation escaping (%, \r, \n) and markdown pipe escaping - Fix double file close in calculateBaseFileHealth - Fix err.Error() != "EOF" to err != io.EOF in HTTP handler - Fix errcheck violations across format tests and batch tests - Update MCP preset/budget test counts for new reviewPR tool - Reformat all files with gofmt - Add compliance golden file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- action.yml: Pass all inputs via env vars to prevent script injection - action.yml: Generate JSON/GHA/markdown in single pass (was 3 runs) - action.yml: Use env vars for github.repository/PR number in comment step - Score: Cap per-check deductions at 20 points so noisy checks (coupling with 100+ co-change warnings) don't floor the score at 0 - Human format: Fix grade+filename concatenation (missing space) - Effort: Fix comment claiming 400 LOC/hr (code uses 300/500) - Classify: Remove dead code path (Additions==0 && Deletions==0 already caught by total==0 above), remove unreachable .github map entry - Baseline: Fix misleading "symlink" comment (it's a copy) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Health check was the main bottleneck — for each file it computed churn, coupling, bus factor, and age scores TWICE (before + after) despite these being branch-independent (identical values, zero delta). Changes: - Compute repo-level metrics once per file via repoMetrics struct, pass to both calculateFileHealth and calculateBaseFileHealth - Cap health check at 30 files (was unbounded) - Reduce coupling gap file limit from 30 to 20 - Reduce split coupling lookup limit from 30 to 20 - Add ctx.Err() checks in all per-file loops (health, complexity, coupling, split) so cancellation is respected between iterations For a 39-file PR this cuts ~156 git subprocess calls (4 metrics × 39 files that were duplicated) and caps the total file processing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ckb review CLI examples and reviewPR MCP tool to CLAUDE.md - Fix reviewPR description: list all 14 checks, say "concurrently where safe" - Reuse single complexity.Analyzer in health check (avoids 60+ cgo allocs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New pr-review job in CI: runs on PRs after build, posts comment, emits GHA annotations, writes job summary - New examples/github-actions/pr-review.yml documenting full usage - Update examples README: add pr-review, mark pr-analysis as legacy - Fix action.yml misleading comment, route exit code through env var Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move Key Risks section after the checks table so the markdown flows as: checks → narrative → findings. Enable git-blame fallback in reviewer suggestions so repos without CODEOWNERS still get suggested reviewers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ci.yml: Move pull-requests:write from workflow-level to pr-review job only (other jobs no longer get unnecessary PR write access) - build-matrix.yml: Set cancel-in-progress:false (runs on main push only, cancelling artifact builds on rapid merges loses artifacts) - action/ckb-review: Pin upload-sarif to SHA @b1bff81...dcd061c8 (v4), was floating @V3 tag — inconsistent with all other pinned actions - Update golden for Top Risks section reorder Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- go.mod: Require Go 1.26.1 to resolve GO-2026-4599 through GO-2026-4602 (crypto/x509 cert validation, net/url IPv6 parsing, os.Root escape) - ci.yml: Align download-artifact SHA to 018cc2cf... matching nfr.yml and security-gate.yml (caught by cicheck consistency test) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The "Fail on review verdict" step referenced ${SCORE} without declaring
it in the env block. Reviewers field now omits from JSON when empty
instead of emitting "reviewers": null.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Four fixes for large-PR noise: 1. New files no longer count as "degraded" — baseline is 0 (not 100), so the delta reflects actual health, not a fake drop from perfect. 2. Total score deduction capped at 80 (floor of 20/100) — prevents 5+ checks from each hitting their per-check cap and zeroing the score. 3. Cluster output capped at 10 in both human and markdown formatters, with "... and N more" overflow. 4. Health output filters unchanged files, separates degraded/improved/new in markdown, and caps displayed entries at 10. Also bumps trivy-action from 0.33.1 to 0.35.0 (install was failing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getFileHotspotScore called GetHotspots (git + tree-sitter) once per changed file inside SummarizePR — replaced with getHotspotScoreMap that fetches once and returns a lookup map. getSuggestedReviewers called GetOwnership with IncludeBlame per file — capped to 30 lookups (blame only first 10) so large PRs don't trigger hundreds of git-blame subprocesses. Also includes: narrative/PRTier fields, finding tiers, adaptive output for large PRs, BlockBreaking/BlockSecrets config rename, golden test updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Accepts a SARIF v2.1.0 file (e.g., from golangci-lint) and suppresses CKB findings that share the same file:line with the lint report. This prevents CKB from flagging what the linter already catches — an instant credibility loss per the code review research. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cope into review Add three new review checks backed by existing analyzers: - dead-code: SCIP-based unused code detection in changed files - test-gaps: tree-sitter-based untested function detection (serialized) - blast-radius: fan-out analysis via AnalyzeImpact (opt-in via --max-fanout) Add invocation modes: --staged for index diff, --scope/positional arg for path-prefix or symbol-name filtering. Add explain hints on findings.
…cores Three targeted optimizations to reduce ckb review wall-clock time: 1. Cache hotspot scores: pre-compute once via SkipComplexity option (avoids tree-sitter on 50+ files), share between hotspot and risk checks. Replace SummarizePR call in risk check with direct calculatePRRisk using data already available in ReviewPR. 2. Batch git in health check: replace 4 × N per-file git calls (120+ subprocesses for 30 files) with one git log --name-only for churn/age/coupling and a 5-worker parallel git blame pool. 3. Break serialized block: add tsMu on Engine, run all 5 former serialized checks as independent goroutines that lock only around tree-sitter calls. Git subprocess work in one check overlaps with tree-sitter in another.
… unclamped risk
- Secrets: detect Go struct field declarations (Token string) and
config key→variable assignments ("token": rawToken) as false
positives in isLikelyFalsePositive().
- Coupling: skip CI/config paths (.github/, ci/, *.yml, *.lock) on
both source and target side of co-change analysis — they always
co-change and produce noise, not actionable review signal.
- Risk: clamp score to [0, 1] in calculatePRRisk. Previously factors
could sum above 1.0 on large PRs (e.g. 0.3+0.3+0.3+0.2 = 1.1).
- sortFindings: sort by tier (1→2→3) first, then severity, then path. Previously sorted by severity only, so a coupling warning could push a breaking-change error out of the top-10 budget cap. - Reviewer routing: add ExpertiseArea (top directory per reviewer), IsAuthor conflict detection (author sorted last), and richer Reason text. Add GetHeadAuthorEmail to git adapter for author lookup.
Formatter fixes: - Drop score from header, show file/line counts instead - Collapse passing checks into single line (✓ a · b · c) - Filter summary-restatement findings (Large PR, High churn, etc.) - Group co-change findings per file (Usually changed with: a, b, c) - Cap absurd effort estimates (>480min → "not feasible as single PR") - Collapse health section for large PRs (one-liner summary) - Clean reviewer emails (strip domain, no @ prefix for emails) - Wrap narrative text at 72 chars with consistent indent - Suppress SCIP stale warnings in human format (errors only) - Priority-sort findings by tier+severity before budget cap - Fix co-change false positives from basename vs full path mismatch CI/action updates: - Add dead-code, test-gaps, blast-radius to available checks list - Add max-fanout, dead-code-confidence, test-gap-lines action inputs - Drop score from GitHub step summary (verdict + findings suffice)
- Move architecture SVG to docs/plans/, update Responsibility→Complexity to reflect what's actually wired - Add image reference in review-cicd.md spec - Update CLAUDE.md check count and list (14→17)
The pr-review job was skipped when any upstream job (lint, test, security, build) failed, preventing the review comment from being posted on the PR. This is exactly when the review comment is most needed. Use always() so the job runs regardless of upstream status, with a fallback build step when the artifact isn't available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
CKB review failed to generate output. |
…tests - Fix coupling threshold comment (said 70%, code uses 30%) - Remove phantom coverage weight (never computed, inflated other factors) - Redistribute weights: cyclomatic 20→25%, age 10→15%, total = 1.0 - Apply neutral-pessimistic penalty when tree-sitter can't parse (binary files no longer get artificially high health scores) - Add warning log when git is unavailable for health metrics - Add format constants (FormatMarkdown, FormatGitHubActions, etc.) and use them consistently in review.go switch dispatch - Unify display caps across human/markdown formatters (10 findings, 10 clusters) via shared constants - Add API handler tests (9 tests covering GET, POST, policy overrides, method validation, edge cases) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Self-Review:
|
| Metric | Value |
|---|---|
| Verdict | WARN |
| Score | 33/100 |
| PR Tier | large |
| Files | 74 (74 reviewable, 0 generated) |
| Changes | +12,191 lines |
| Modules | 15 |
| Clusters | 13 (all independent) |
| Review effort | "not feasible as single PR" (62.5h estimate) |
Checks
| Status | Check | Detail |
|---|---|---|
| warn | risk | Risk score: 1.00 (high) |
| warn | coupling | 4-5 commonly co-changed files missing |
| warn | split | 74 files across 13 independent clusters |
| info | hotspots | 50 hotspot files touched |
| info | test-gaps | 17 untested functions |
| pass | breaking | No breaking API changes |
| pass | dead-code | No dead code |
| pass | secrets | No secrets detected |
| pass | tests | 9 tests cover the changes |
| pass | health | 13 new files (avg grade B), 0 degraded |
| pass | complexity | +27 cyclomatic across 6 files |
What it caught vs what manual review found
A separate deep review (6 parallel agents reading critical files) found 9 actionable issues. Here's the comparison:
| Finding | ckb review |
Manual review |
|---|---|---|
| Coupling gaps (adapter.go) | Found | Found |
| Complexity spikes (diff.go +7, pr.go +13) | Found | Found |
| Test coverage gaps (17 functions) | Found | Found |
| Split into 13 clusters | Found | N/A |
| Coupling comment says 70%, code uses 30% | Missed | Found |
| Phantom coverage weight (10%, never computed) | Missed | Found |
| Binary files get inflated health scores | Missed | Found |
| Format constant inconsistency | Missed | Found |
| API handler 0% test coverage | Missed | Found |
| Silent git metric defaults (75) | Missed | Found |
| Display cap inconsistency across formats | Missed | Found |
Assessment
ckb review operates at the structural/statistical level — file counts, complexity deltas, coupling rates, risk scores. It correctly identified the shape of the problem (too large, 13 clusters, high risk, coupling gaps) and pointed at the right files.
It cannot catch semantic bugs (comment contradicts code), architectural issues (phantom weights), or design inconsistencies (display caps differ across formats). Those require reading the actual code.
The three-pass strategy works:
- Pass 1 (
ckb review, seconds) → structural triage, risk ranking, cluster map - Pass 2 (targeted file reads on critical files) → found all 9 real bugs
- Pass 3 (cluster review) → verified independent clusters are clean
ckb review saved the deep review from wasting time on 43 clean new files and 15 mechanical config files. It's a pre-filter, not a replacement — and that's exactly the design intent.
Fixes applied
All 9 issues from manual review have been addressed in commits ecc1e49 and 0e9fcde:
- Health scoring: removed phantom weight, fixed coupling comment, added binary file penalty, added git fallback warning
- Format dispatch: added constants, unified display caps
- Tests: added 9 API handler tests (was 0% coverage)
- CI: pr-review job now runs even when upstream jobs fail
|
CKB review failed to generate output. |
Summary
Unified PR review engine that runs 17 quality checks in a single
ckb reviewcommand. Designed for large PRs (600+ files) where naive AI review burns $5-15 in tokens — CKB precomputes what matters so the AI reads 10-15 files instead of 600.New analyzers wired into review flow
tsMu)AnalyzeImpact(opt-in via--max-fanout)New invocation modes
--staged— review staged changes instead of branch diff (pre-commit use)--scope/ positional arg — filter to path prefix or symbol nameckb review internal/query/— scope by directoryckb review --scope=AuthMiddleware— scope by symbolPerformance (3 optimizations)
SkipComplexityoption onGetHotspotsavoids tree-sitter on 50+ files.checkRiskScoreFastuses pre-computed data instead of callingSummarizePR(which re-did the entire diff + hotspot analysis)git log --name-onlyreplaces 120+ per-file calls; 5-worker parallel blame pool replaces 30 sequential callstsMuon Engine lets all checks run as independent goroutines, locking only around tree-sitter calls. Git subprocess work in one check overlaps with tree-sitter in anotherFormatter overhaul (7 fixes)
⚠ WARN · 111 files · 12,879 lines✓ breaking · secrets · tests · health · complexityUsually changed with: a, b, cnot feasible as a single PRlisa.welschinstead of@lisa.welsch1985@gmail.comNoise reduction
Token string) and config key→variable assignments ("token": rawToken) detected as false positives.github/,ci/,*.yml,*.lock) skipped on both source and target sideReviewer routing
ExpertiseArea— top directory per reviewerIsAuthor— conflict detection (author sorted last)GetHeadAuthorEmailadded to git adapterCI/CD
max-fanout,dead-code-confidence,test-gap-linesinputspr-reviewjob made resilient to upstream build failures (fallback build step)Documentation
docs/plans/(Responsibility→Complexity fix)/features/code-reviewpage with full feature overviewFiles changed
113 files, +12,754 / -495 lines across:
internal/query/— review engine, 3 new check files, health batching, perfinternal/secrets/— false positive detectioninternal/backends/git/—GetHeadAuthorEmail,GetStagedDiffcmd/ckb/— CLI flags, formatter overhaul, golden test updates.github/workflows/— CI resilience, action inputsdocs/— architecture SVG, spec updatestestdata/— golden files for all output formatsTest plan
go build ./...passesgo test ./internal/query/... -run TestReview— 25 tests pass (7 new)go test ./cmd/ckb/...— all formatter + golden tests passgo test ./internal/secrets/...— scanner tests passgo test ./internal/backends/git/...— adapter tests passgo vet ./...cleanckb review --base=mainproduces clean outputnpx next buildpasses🤖 Generated with Claude Code