perf: bound review context hydration#53
Conversation
ds4psb-ai
left a comment
There was a problem hiding this comment.
Independent code review
Posted on behalf of @ds4psb-ai (read-only contributor). Maintainer judgment owns merge.
Summary verdict
LGTM after addressing one P1 (orchestration test gap) — compactMappedWindow, githubContextWindowPlan, and the counts struct extension all check out at every boundary I traced; the new behavior is correctly opt-out (null total → falls back to full pagination) and the compactMappedSlice aliasing is semantically identical to the pre-PR implementation.
Findings
P0 / blocker
- (none)
P1 / should fix before merge
src/clawsweeper.ts:2143-2178—ghPagedContextWindowis the load-bearing orchestration in this PR, and it has no direct unit test. The 4 added tests covercompactMappedWindow(2) andgithubContextWindowPlan(2), but the tail-merge logic wheretailFirstPageNumber === 1reuses the already-fetchedfirstPage(line 167) is the trickiest part — a future refactor could easily double-fetch page 1 or, worse, drop the head whenkeepStart > 0andtailFirstPageNumber === 1. Suggest one test that mocksghPage(e.g. via dependency injection or by exportingghPagedContextWindowfor test) and asserts (a) page 1 is fetched exactly once when total=101 limit=80, and (b) the returned items are the correct[1..40, 62..101]shape. Without this, the boundary the PR specifically fixes (page 1 + page 2 share the tail) regresses silently.
P2 / nice to have / followup
src/clawsweeper.ts:1717-1769(collectRelatedMentions) — the PR body discloses that "Related mention scanning follows the hydrated comment/review-comment window". Confirmed:options.commentsandoptions.pullReviewCommentsnow contain only the head/tail slice, so#NNNreferences buried in the middle of a 3000-comment thread are silently dropped from the related-items sidebar of the review prompt. The trade-off is correct (the model couldn't see them anyway), but worth a short// Note: middle-window mentions are intentionally dropped — see ghPagedContextWindowcomment atsrc/clawsweeper.ts:1748so a future contributor doesn't "fix" this by re-paginating.src/clawsweeper.ts:1518-1556(compactMappedWindow) — theitems.length > boundedLimit && boundedTotal === items.lengthbranch on line 1543 is a defensive case for callers who pass already-full hydration withtotal === items.length. It works, but the branch is non-obvious; one inline comment explaining "keep full retained set when caller already paginated everything" would help. Not blocking.src/clawsweeper.ts:2148-2150(ghPagedContextWindow,total === nullfallback) — theitems.lengthself-reporttotal = items.lengthcollapses the distinction between "GitHub didn't tell us a total" and "there are exactly N items", which meanscommentsTruncated: falsewill be reported when the count is genuinely missing. This is fine for the prompt, but the renderedGitHub Snapshotblock (src/clawsweeper.ts:6079-6082) won't ever say "truncated" in this branch — possibly worth surfacing thenull-totalcase in the snapshot for diagnostic purposes.
Test coverage
4 added tests covering compactMappedWindow (head/tail-marker behavior at hydrated and unhydrated extremes) and githubContextWindowPlan (page-boundary tail and large-page-skip case). Cover the algorithmic core, but leave gaps:
ghPagedContextWindoworchestration (P1 above).compactMappedWindowwithkeepStart === 0 && keepEnd > 0— exercised by my static trace (it correctly returns[..., omitted, lastTail]with no head), but unreachable from production callers (smallest caller limit is 24). Worth a unit-test pin for future safety.- The
total === nullfallback path ofghPagedContextWindow— locks in that when GitHub doesn't carry a total, full pagination still works. - A
pullRecord.review_comments === undefinedcase (older GitHub API responses or partial cache) —githubCountreturns null, falls back to full pagination. Worth one test.
Risks I considered and dismissed
compactMappedSlice↔compactMappedWindowaliasing: ran a full equivalence trace across 7 inputs (includinglimit=0,items=[],items.length === limit); the newcompactMappedSlice = compactMappedWindow(items, items.length, limit, mapper)is a strict-output preserving rename. No callers of the old function need migration.tailFirstPageNumbermath at boundaries: tracedtotal ∈ {80, 81, 100, 101, 200, 3000}andlimit ∈ {1, 2, 3, 80, 100}. Every case matches the PR's two unit-test assertions for tail offset and last page. TheMath.floor((total - keepEnd) / perPage) + 1formula is correct because GitHub pages are 1-indexed and the items-since-page-1-start is zero-indexed.countsstruct extension breaking downstream:grep -n 'context\.counts' src/clawsweeper.tsshows onlyGitHub Snapshotmarkdown rendering uses these fields, and the PR updates that block (lines 6279-6306 in the diff) to consume the new optional fields viacontextCountTextwith safeundefinedhandling. TherelatedItemsContextpassthrough block (lines 3401-3416) explicitly checks each new field forundefinedbefore assigning. Nothing else reads these fields.- PR#53 making existing notifier or repair-test failures worse:
git diff --stat origin/main..HEADconfirms onlysrc/clawsweeper.tsandtest/clawsweeper.test.tschange;test/repair/is untouched. The Node-22cancelledByParentfinding is preexisting and orthogonal. - Mechanical preservation of full pagination for safety evidence: the four
ghPagedContextWindowcallers (limits 24, 40, 80, 40) cover comments, files, commits, review comments. They all correctly fall back to fullghPagedwhen GitHub doesn't expose a count. Reviews (ghPageddirectly atsrc/clawsweeper.ts:3320) and timelines (ghPagedatsrc/clawsweeper.ts:3300) stay on full pagination — same pattern as the PR#49 split.
The math is right and the change is well-scoped. The orchestration test gap is the only thing I'd want closed before merge.
PR openclaw#53's pagination math was covered, but the load-bearing fetch orchestration was not. Export the window hydrator with injectable fetchers and cover the page-one tail-overlap boundary plus missing-total fallback so future refactors cannot double-fetch or drop retained context silently. Constraint: This is a launch-readiness fixup for a P1 review finding on PR openclaw#53. Rejected: Only test githubContextWindowPlan | the bug class is in how planned pages are fetched and stitched, not the pure plan math alone. Confidence: high Scope-risk: narrow Directive: Keep total-less GitHub responses on the full-pagination fallback unless GitHub exposes reliable count metadata. Tested: pnpm run build:all; node --test test/clawsweeper.test.ts
|
Addressed the orchestration test gap. What changed:
Local proof:
Local note: this machine is on Node 22 while the repo declares Node >=24, so CI remains the authoritative full-suite check. |
steipete
left a comment
There was a problem hiding this comment.
Reviewed bounded review context hydration fix; CI green and targeted local unit tests passed.
Summary
Why
OpenClaw has thousands of open PRs and issues. For large PRs, the prompt keeps only a first/last window of files, commits, and review comments, but ClawSweeper was still paginating every middle page before compaction. That spends API calls, JSON parsing, memory, and wall-clock time on data that cannot be sent to the model.
Example: a 3,000-file PR keeps the same 80 prompt file entries, but avoids hydrating the middle file pages.
Behavior
Validation
pnpm --ignore-workspace run build:allpnpm --ignore-workspace run test:unitpnpm --ignore-workspace run format:checkgit diff --checkNotes:
pnpm --ignore-workspace run checkcurrently reaches lint and fails on existing repo-wide oxlint findings unrelated to this PR.pnpm --ignore-workspace run test:repairalso hits two existing notifier tests with Node 22cancelledByParent; changed files are not in repair.