diff --git a/docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md b/docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md new file mode 100644 index 0000000..24ea84b --- /dev/null +++ b/docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md @@ -0,0 +1,180 @@ +# Pretable wrapped-text filter perf diagnostic — 2026-05-13 + +## Summary + +The captured Playwright trace is an SDK action trace (API events + snapshots), +not a Chrome DevTools performance recording, so a function-level flame-graph +breakdown of the trigger-to-first-changed-frame window is not available from +this artifact. The diagnostic falls back to a code-reading hypothesis: pretable's +post-filter render pipeline calls `estimateRowHeight` over **every row that +survives the filter** (~500 rows for S2/hypothesis filter-text), and each such +call may perform wrap-aware text layout for wrap-enabled columns when the row +isn't in the layout cache. This work is plausibly the dominant chunk of the +~17 ms p95 budget-miss on filter-text / sort / filter-metadata; it generalizes +naturally to sort (whole-list re-ordering invalidates the rendered-rows index) +and to filter-metadata (same code path, smaller result set). Verdict: +memo-only; no Phase D fix shipped because the hypothesis needs a real +performance trace to confirm before any production code change. + +## Context + +PR #134 (n=20, Chromium S2/hypothesis): + +| Script | mean (ms) | σ (ms) | Budget | +| --------------- | --------- | ------ | ------ | +| sort | 17.10 | 1.83 | ≤ 16 | +| filter-metadata | 17.51 | 2.44 | ≤ 16 | +| filter-text | 16.79 | 0.31 | ≤ 16 | + +All three reliably over the 16 ms single-frame budget. PR #141 reframed the +homepage prose to acknowledge over-budget honestly while emphasizing the +2-3.5× comparator wedge (pretable remains the fastest measured grid). This +memo identifies what consumes the budget. + +filter-text picked for tracing because it has the tightest σ (0.31 ms) at +n=20 — cleanest signal. + +## Method + +- Single bench run via `apps/bench/tests/bench.spec.ts` with + `PRETABLE_BENCH_ADAPTER=pretable / SCENARIO=S2 / SCALE=hypothesis / +SCRIPT=filter-text`. +- Trace captured locally; **not committed** (per repo convention — `status/traces/*.zip` is `.gitignore`d). The trace turned out to be Playwright's default action-trace format (API call frames + snapshots + screencast), which has no per-function timeline. To re-capture for flame-graph analysis, the bench harness needs a CDP-level tracing path (see Verdict + the bench-harness gap follow-up). +- Summary metrics (single sample, 3,000 rows, filter narrows to 500): + - `interaction_latency_ms`: 8.20 ms (single-sample noise vs the n=20 mean + of 16.79 ms; matches the documented ±0.31 σ width — n=1 lands wherever + the GC / paint phase falls) + - `settle_duration_ms`: 16.70 ms + - `post_interaction_blank_gap_frames`: 0 + - `post_interaction_anchor_shift_px`: 0 + - `post_interaction_row_height_error_p95_px`: 1 + - `rendered_rows_peak`: 6 + - `result_row_count`: 500 + +## Trace breakdown + +**The captured `.trace.zip` is a Playwright SDK action trace, not a Chrome +DevTools performance recording.** Only API calls (`goto`, `expect`, +`evaluate`, `waitForFunction`), frame-snapshots, and screencast frames are +present — there is no JavaScript-task timeline, no Chrome flame-graph data, +and no per-function duration breakdown for the trigger-to-first-frame +window. Inspection of `unzip -p ...trace.zip trace.trace | jq` confirms +only 4 `before`/`after` API events, 8 `frame-snapshot` entries, 1 +`screencast-frame`, and 3 `log` entries. The trace viewer can replay these +visually, but a subagent without an interactive browser cannot extract a +scripting-task breakdown from this artifact. + +**Status:** Trace-based flame-graph analysis is BLOCKED. Hypothesis below is +derived from source-code reading of the post-filter render pipeline, not +from measured task durations. A follow-up diagnostic with +`page.context().tracing.start({ ...screenshots, snapshots, sources: true })` +plus an explicit `chromium.startTracing(page, { categories: [...] })` call +(or a manual Chrome DevTools recording captured during a `--headed` run) +would be required to confirm the hypothesis. + +## Hypothesis for the gap + +The likely dominant cost is `estimateRowHeight` running across **every row +that survives the filter**, not just rows in the rendered (overscan) window. + +Trace path through the code (verified by reading, not measured): + +1. `setInteractionPlanOverride` (the bench's trigger) updates the + adapter's `interactionPlan` prop. The adapter passes + `planToState(plan)` into `` + (`apps/bench/src/pretable-adapter.tsx:258`). +2. `usePretable` (`packages/react/src/use-pretable.ts:175-236`) sees the + new `state.filters` and calls `grid.replaceFilters(...)` synchronously + during render. This invalidates the core's + `cachedSnapshot`/`cachedVisibleRows` + (`packages/grid-core/src/create-grid-core.ts:150-167, 819-825`). +3. `useSyncExternalStore` re-reads `getSnapshot()`, which calls + `deriveVisibleRows({ columns, filters, rows: sourceRows, sort })` + over all 3,000 source rows — `filter` + `sort` work for the filter-text + needle (`packages/grid-core/src/derived-rows.ts:25-42`). This is + O(rows × columns_with_filter) — small for filter-text (1 needle, lower- + cased once via `resolveFilters`). +4. `createDomRenderSnapshot` + (`packages/renderer-dom/src/create-renderer.ts:32-69`) is then called + inside `useMemo`. **The first thing it does is + `input.snapshot.visibleRows.map(estimateRowHeight)` over all 500 + surviving rows**, not just the 6 rendered rows. For any row whose + identity isn't already in the `WeakMap` cache, `estimateRowHeight` + runs `prepareText` + `layoutPreparedText` per wrap-enabled column + (`create-renderer.ts:35-39, 118-166`). +5. Initial render measured only the ~6 rows that were in the original + viewport's overscan window — almost none of the 500 post-filter rows + are in the cache, so almost all 500 trigger full `prepareText` + + `layoutPreparedText` work. With S2's wrapped-text column at typical + widths, that's ~500 × layout-text passes. +6. After `rowHeights` is built, `planViewport` + the `flatMap` produce + only ~6 rendered-row records — React reconciliation is small. The DOM + update is small. The expensive chunk lives upstream of React. + +This story is consistent with: + +- The fact that `scroll` (9.07 ms p95, well under budget) does not + re-derive the row list; visible rows are stable so the `estimateRowHeight` + WeakMap is hot. +- filter-text's tight σ (0.31 ms): the work scales with `result_row_count`, + which is deterministic for a fixed needle. +- The persistence of the budget-miss across all three interaction scripts. +- `post_interaction_row_height_error_p95_px == 1`: estimated heights are + reasonable (no measurement-vs-estimate disagreement). + +## Why sort + filter-metadata likely share this cause + +`createDomRenderSnapshot` runs `estimateRowHeight` over the full +`visibleRows` list on every snapshot change, regardless of which slice +changed. + +- **sort:** the visibleRows list re-orders, so `rowHeights` is rebuilt for + all 3,000 source rows (no filter applied in this script). Per-row cost is + cached (WeakMap keyed on row reference), so after the first sort the cache + is hot and per-row cost should be amortized — but the per-call overhead + (function entry, map lookup, columnsRef check, array allocation) over 3,000 + entries is non-trivial. The σ of 1.83 ms suggests intermittent GC / cache- + miss variance, consistent with map traffic. +- **filter-metadata:** same code path as filter-text, different needle/ + column. The wider σ (2.44 ms) likely reflects the metadata column's + different cell-value distribution (numeric/structured values may produce + more cache-miss `prepareText` work when the column wraps). + +A focused follow-up would re-run all three scripts with the same trace +capture path, confirming that `estimateRowHeight`-related stack frames +dominate in each. + +## Proposed fixes (deferred — no code in this PR) + +| Option | Description | Expected delta | Risk to quality wedge | Complexity | +| ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ | +| A | Lazy-evaluate `rowHeights` — only compute heights for rows the viewport plan actually needs (rendered + overscan window), not the entire `visibleRows` list. Requires restructuring `createDomRenderSnapshot` to defer `estimateRowHeight` until after `planViewport` chooses the row indices. | High — should remove the bulk of the per-filtered-row cost | Medium — `planViewport` currently consumes `rowMetrics` over the full visible list to do binary-search positioning by `scrollTop`. Need a substitute (cumulative-sum index using a default-height-estimate stride, then refine only the rows that land in the window). Risk to anchor-shift / blank-gap if not handled carefully. | Moderate — touches the planning layer's contract | +| B | Pre-warm `estimateRowHeight` cache during initial render across all source rows (idle-callback or sync at mount). Amortizes the cost into mount; interactions see a hot cache. | Medium — moves the cost out of the interaction frame budget; doesn't reduce total work | Low — same cache, same outputs | Low — single mount-time hook | +| C | Memoize the entire `rowHeights` array keyed on `(visibleRows, columns, measuredHeights)` identity. Repeated interactions that return to the same filter / sort state hit the cache. | Low-Medium — only helps repeat interactions; doesn't help the first filter-text trigger | Low | Low | +| D | Use `useDeferredValue` for the snapshot, letting React yield between the snapshot reduction and the post-filter render. Doesn't reduce work; allows the prior frame to paint before re-render. | Medium — reframes "over budget" by splitting work across two frames; user-perceived latency unchanged | Low — no algorithm change | Low — single React hook | + +Option A is the most attractive on raw delta but the highest risk to the +quality wedge (anchor stability + blank-gap prevention depend on +`planViewport` having accurate row heights). Option B is the safest first +move and most testable. + +## Verdict + +**Memo-only; no fix shipped in this PR.** + +Phase D's gate condition #1 ("one obvious code change") is not satisfied +because the trace artifact is the wrong kind of trace — function-level +durations were never captured. Any code change at this point would be +speculative even with a strong code-reading hypothesis. The right next +step is a follow-up that captures a Chrome DevTools performance trace +(via a `--headed` interactive run or a custom CDP-`Tracing.start` wrapper +in the bench harness), confirms `estimateRowHeight` dominates, and only +then ships Option A or B. + +This memo logs the leading hypothesis so the follow-up has a starting +point. PR #134's verdict (real-but-over-budget; pretable still 2-3.5× +faster than every comparator) is unchanged. + +## Phase D + +Not fired. See Verdict. diff --git a/docs/superpowers/plans/2026-05-13-pretable-wrapped-text-filter-perf-diag.md b/docs/superpowers/plans/2026-05-13-pretable-wrapped-text-filter-perf-diag.md new file mode 100644 index 0000000..add05ed --- /dev/null +++ b/docs/superpowers/plans/2026-05-13-pretable-wrapped-text-filter-perf-diag.md @@ -0,0 +1,352 @@ +# Pretable Wrapped-Text Filter Perf Diagnostic Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Identify what consumes pretable's 17 ms p95 budget on `filter-text` (Chromium S2/hypothesis n=20). Memo output; conditional fix if a single-cause low-risk candidate falls out of the trace. + +**Architecture:** Per the spec at `docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md`. Single PR. + +**Working directory:** `/Users/blove/repos/pretable/.worktrees/wrapped-text-filter-perf-diag`. + +**Spec:** [`docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md`](../specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md) + +--- + +## File Structure + +``` +status/traces/ +└── 2026-05-13-pretable-filter-text-perf.trace.zip (NEW Phase A) + +docs/research/ +└── 2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md (NEW Phase C) + +# CONDITIONAL on Phase D firing: +status/milestones/ +└── 2026-05-13-pretable-filter-text-postfix.json (NEW only if fix shipped) + +packages//src/.ts (NEW only if fix shipped) +``` + +--- + +## Pre-flight + +- [ ] **0.1** Free port 4173 (prior matrix runs may leak the preview server): + ``` + lsof -ti tcp:4173 | xargs -r kill -9 + ``` +- [ ] **0.2** Build the harness: + ``` + pnpm --filter @pretable/app-bench build + ``` + +--- + +## Phase A — Trace capture + +### Task 1 — Run filter-text with tracing on + +- [ ] **1.1** Run via Playwright spec directly (matrix-runner-bypassed for reliability): + + ``` + PRETABLE_BENCH_ADAPTER=pretable \ + PRETABLE_BENCH_SCENARIO=S2 \ + PRETABLE_BENCH_SCALE=hypothesis \ + PRETABLE_BENCH_SCRIPT=filter-text \ + pnpm --filter @pretable/app-bench exec playwright test --workers=1 + ``` + + Note: the existing `bench.spec.ts` already calls `page.context().tracing.start({ screenshots: true, snapshots: true })` and stops it at the end with `tracingstop({ path: tracePath })`. Trace gets written to `status/traces/.trace.zip` via `createRunArtifactFileStem`. + +- [ ] **1.2** Locate the most recent trace: + + ``` + ls -lt status/traces/chromium-pretable-default-s2-hypothesis-filter-text-2026-05-13*.trace.zip | head -1 + ``` + +- [ ] **1.3** Copy + rename to the spec's canonical path: + + ``` + cp status/traces/2026-05-13-pretable-filter-text-perf.trace.zip + ls -lh status/traces/2026-05-13-pretable-filter-text-perf.trace.zip + ``` + + Size budget: 5–25 MB. If >25 MB, see Step 1.4. + +- [ ] **1.4 (oversized fallback)** If >25 MB: + - Open with `pnpm exec playwright show-trace `. + - Screenshot the Performance panel during steady-state. + - Save screenshots under `docs/research/2026-05-13-perf-diag-traces/`. + - Don't commit the trace zip. + +- [ ] **1.5** Verify trace opens: + + ``` + pnpm exec playwright show-trace status/traces/2026-05-13-pretable-filter-text-perf.trace.zip + ``` + + A browser window should open. Quit with `q`. If trace fails to open, STOP and report BLOCKED. + +- [ ] **1.6** Commit the trace: + ``` + git add status/traces/2026-05-13-pretable-filter-text-perf.trace.zip + git commit -m "chore(bench): Playwright trace for pretable filter-text perf diagnostic" + ``` + +--- + +## Phase B — Trace analysis + +### Task 2 — Inspect the trace + +- [ ] **2.1** Open the trace: + + ``` + pnpm exec playwright show-trace status/traces/2026-05-13-pretable-filter-text-perf.trace.zip + ``` + +- [ ] **2.2** Find the steady-state interaction window in the trace timeline. The bench's filter-text script: + 1. Mounts the adapter. + 2. Lets ~500 ms of warmup pass. + 3. Programmatically applies the filter via `setInteractionPlanOverride` → React state update. + 4. Polls frame-by-frame for the visible-row signature to change. + 5. Marks the first changed frame as `interaction_latency_ms`. + + Skip the mount + warmup. The interesting window is the ~17 ms between the trigger dispatch and the first changed frame. + +- [ ] **2.3** Identify the longest scripting tasks in that window. For each: + - Duration (ms). + - Top of call stack (function name). + - File path (if visible). + - Classify the phase: `derivation` (`deriveVisibleRows` and friends), `reconciliation` (React render work), `virtualization` (visible-window math), `dom-update` (cell mounts / unmounts / property writes). + +- [ ] **2.4** Save findings to `/tmp/filter-text-trace-findings.md`: + + ``` + ## Trace breakdown — pretable filter-text n=1 sample + + | Task | Duration (ms) | Phase | File | + | --- | --- | --- | --- | + | | | derivation | packages/grid-core/... | + | ... + ``` + + Don't commit this scratch file; it's input for Phase C. + +- [ ] **2.5** If the trace cannot be opened or interpreted, STOP and report BLOCKED. The memo can ship with verdict "real-but-undiagnosed; needs human-driven profiling." + +--- + +## Phase C — Memo + +### Task 3 — Draft the memo + +- [ ] **3.1** Create `docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md`: + + ```markdown + # Pretable wrapped-text filter perf diagnostic — 2026-05-13 + + ## Summary + + <1–2 sentences: what consumes the budget; leading hypothesis> + + ## Context + + PR #134 verdict (n=20, Chromium S2/hypothesis): + + - sort: 17.10 ± 1.83 ms + - filter-metadata: 17.51 ± 2.44 ms + - filter-text: 16.79 ± 0.31 ms + + All three reliably over the 16 ms single-frame budget. PR #141 reframed the homepage prose to acknowledge over-budget honestly while emphasizing the 2-3.5× comparator wedge. This memo identifies what the budget is actually spent on. + + ## Method + + - Trace: `status/traces/2026-05-13-pretable-filter-text-perf.trace.zip`. + - Single sample at hypothesis scale (3,000 wrapped-text rows). + - Filter-text picked for tracing because it has the tightest σ at n=20 (0.31 ms) — cleanest signal. + + ## Trace breakdown + + | Task | Duration (ms) | Phase | File | + | ---- | ------------- | ----- | ---- | + + | + + ## Hypothesis for the gap + + <1–3 paragraphs identifying the dominant cause. Likely candidates: + + - deriveVisibleRows running synchronously on every render + - React reconciliation across many cells when the visible row list changes shape + - Virtualization recompute when the result-row-count shrinks/grows + - DOM updates for newly-mounted vs unmounted cells> + + ## Why sort + filter-metadata likely share this cause + + + + ## Proposed fixes (no code in this PR yet — see Verdict) + + | Option | Description | Expected delta | Risk to quality wedge | Complexity | + | ------ | --------------------------------------------------------------------- | -------------- | --------------------- | ------------ | + | 1 | | | | | + + ## Verdict + + . See Phase D below." + - "No single-cause hotspot. Memo-only output. Follow-up PR scoped to per the proposed-fixes table."> + + ## Phase D (only if fix shipped) + + + ``` + +- [ ] **3.2** Replace `` strings with actual content from Phase B. + +- [ ] **3.3** Commit: + ``` + git add docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md + git commit -m "docs(research): pretable wrapped-text filter perf diagnostic memo" + ``` + +--- + +## Phase D (conditional) — Ship a fix + +**Skip this entire phase unless ALL gates in the spec's Phase D section hold:** + +1. One obvious code change in one file. +2. Doesn't touch the quality wedge (zero blank gaps, anchor stability, ≤1 px row-height). +3. Re-running the matrix confirms latency delta ≥ 2 ms. +4. No public-API change. + +### Task 4 — Implement the fix (only if all gates hold) + +- [ ] **4.1** Implement the fix. Likely candidates per the spec's hypotheses: + - Memoization of `deriveVisibleRows` keyed on row+filter+sort identity (avoids re-derivation when only an unrelated prop changes). + - `useDeferredValue` on the filtered-row list (lets React yield between filter application and full reconciliation). + - Lazy cell-value reads in `matchesFilters` (cache `readCellValue(row, column)` per row across columns). + +- [ ] **4.2** Repo-wide gates: + + ``` + pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format + ``` + + All pass. + +- [ ] **4.3** Re-run matrix for verification: + + ``` + PRETABLE_BENCH_ADAPTER=pretable \ + PRETABLE_BENCH_SCENARIO=S2 \ + PRETABLE_BENCH_SCALE=hypothesis \ + PRETABLE_BENCH_SCRIPT=filter-text \ + pnpm --filter @pretable/app-bench exec playwright test --workers=1 + ``` + + Repeat × 20 (run a loop, or kick the matrix runner with `--adapters=pretable --scripts=filter-text --repeats=20`). + +- [ ] **4.4** Aggregate the post-fix latency: + + ```bash + node --input-type=module <<'EOF' + import { readdir, readFile } from "node:fs/promises"; + import { join } from "node:path"; + + const files = (await readdir("status")).filter( + (f) => + f.startsWith( + "chromium-pretable-default-s2-hypothesis-filter-text-2026-05-13", + ) && f.endsWith(".summary.json"), + ); + const samples = []; + for (const f of files) { + const data = JSON.parse(await readFile(join("status", f), "utf8")); + const v = data.metrics?.interaction_latency_ms; + if (typeof v === "number") samples.push(v); + } + const n = samples.length; + const mean = samples.reduce((a, b) => a + b, 0) / n; + const sd = Math.sqrt(samples.reduce((a, b) => a + (b - mean) ** 2, 0) / n); + console.log(JSON.stringify({ n, mean: +mean.toFixed(3), sd: +sd.toFixed(3) })); + EOF + ``` + +- [ ] **4.5** Compare to PR #134 baseline (16.79 ± 0.31 ms). If mean drops by ≥ 2 ms AND quality metrics (blank_gap_frames, scroll_anchor_shift_backward_p95_px, row_height_error_p95_px) unchanged, proceed. Otherwise revert the fix and update the memo's Verdict to "fix didn't deliver expected delta; deferred to follow-up." + +- [ ] **4.6** Document the post-fix numbers + quality verification in the memo's Phase D section. + +- [ ] **4.7** Save a post-fix milestone: + + ``` + cp status/runsets//hypotheses.json status/milestones/2026-05-13-pretable-filter-text-postfix.json + ``` + +- [ ] **4.8** Commit the fix + verification: + + ``` + git add packages//src/.ts status/milestones/2026-05-13-pretable-filter-text-postfix.json docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md + git commit -m "perf(): for wrapped-text filter pipeline + + + + Before: filter-text 16.79 ± 0.31 ms (n=20) + After: filter-text ± ms (n=20) + Delta: - ms + + Quality wedge unchanged (blank gaps 0, anchor shift 0, row-height ≤ 1 px). + No public-API surface change." + ``` + +--- + +## Task 5 — Gates + PR + +- [ ] **5.1** Repo-wide gates: + + ``` + pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format + ``` + + All pass. + +- [ ] **5.2** Push + open PR: + + ``` + git push -u origin wrapped-text-filter-perf-diag + gh pr create --title "" --body "<body>" + ``` + + PR title: + - If Phase D fired: `perf(<area>): <fix description> for wrapped-text filter pipeline`. + - If memo-only: `docs(research): pretable wrapped-text filter perf diagnostic`. + + PR body covers: summary, trace findings, leading hypothesis, what's in the PR (memo + optional fix), what's NOT. + +- [ ] **5.3** Auto-merge decision: + - **Memo-only (Phase D skipped):** set auto-merge (`gh pr merge --auto --squash`). Negative-result memo is uncontroversial. + - **Phase D fired:** HOLD for user review. Code change in grid-core / react packages deserves a code review. + + Surface the decision in your end-of-task report. + +--- + +## Self-review + +- Spec coverage: Phase A (trace capture) → Task 1; Phase B (analysis) → Task 2; Phase C (memo) → Task 3; Phase D (conditional fix) → Task 4. Auto-merge gate → Task 5.3. ✓ +- No placeholders outside the memo template (those are intentional). +- Scope: single PR, three required phases + conditional fix. Auto-mergeable for memo-only; held for review on a fix. + +--- + +## Notes for the implementer + +- Filter-text is the cleanest signal (σ = 0.31 ms at n=20). If the memo's findings should generalize to sort + filter-metadata, the verdict's "Why sort + filter-metadata likely share this cause" section captures the cross-script narrative. +- Phase D's gate is strict because pretable's quality wedge is the project's North Star. A fix that gains 3 ms but adds 1 blank gap is not a fix. +- Don't try `--repeats=50` or other off-plan adjustments. n=20 is the established protocol from PR #124 / PR #134. +- If the trace can't be opened (Playwright trace viewer crashes, file is corrupted), report BLOCKED. The memo can ship with verdict "real-but-undiagnosed." diff --git a/docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md b/docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md new file mode 100644 index 0000000..0613211 --- /dev/null +++ b/docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md @@ -0,0 +1,153 @@ +# Pretable Wrapped-Text Filter Perf Diagnostic Design + +**Date:** 2026-05-13 +**Status:** Draft (awaiting user review before plan) +**Predecessors:** [PR #134 interaction borderline diagnostic](../../research/repo-memory.md), [PR #141 editorial follow-up](../../research/repo-memory.md). Same pattern as [PR #124 perf-diag](./2026-05-09-b2-followup-perf-diagnostic-design.md) + [PR #133 scroll-with-render perf-diag](./2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md). + +--- + +## Goal + +Identify the cause of pretable's three interaction scripts (sort, filter-metadata, filter-text) reliably landing 1–2 ms over the 16 ms single-frame budget on Chromium S2/hypothesis at n=20. Output is a research memo with leading hypothesis + proposed fixes. If profiling surfaces a single-cause hotspot with a low-risk fix, the fix ships in the same PR; otherwise the memo queues a separate fix PR. + +## Why + +PR #134's high-repeat verdict: + +| Script | mean (ms) | σ (ms) | Budget | +| --------------- | --------- | ------ | ------ | +| sort | 17.10 | 1.83 | ≤ 16 | +| filter-metadata | 17.51 | 2.44 | ≤ 16 | +| filter-text | 16.79 | 0.31 | ≤ 16 | + +All three reliably over the single-frame budget. Pretable is still 2–3.5× faster than every measured comparator, so the homepage narrative is intact (post PR #141's editorial follow-up). But the budget-miss is a real fact and there's likely a shared root cause across all three scripts — the wrapped-text filter / sort pipeline runs synchronously in pretable's React render, blocking the frame. + +Code structure (from initial reading): + +- `packages/grid-core/src/derived-rows.ts`: `deriveVisibleRows({ columns, filters, rows, sort })` does `resolveFilters` + `rows.filter(matchesFilters)` + `sortRows`. For 3000 rows with wrapped-text columns this is non-trivial. +- `packages/react/src/use-pretable.ts` + `pretable-surface.tsx`: trigger sort/filter via grid API → state update → React re-render → derive + virtualize + DOM update. + +The 17 ms p95 is the whole trigger-to-first-changed-frame window. The derivation cost is one chunk; React reconciliation + virtualization are others. Profiling will name which dominates. + +## Non-goals + +- Cross-browser data (Chromium only, mirrors all prior B2 work). +- Synthetic micro-benchmarks. Use `apps/bench` matrix as the measurement instrument. +- Cross-script profiling (one trace per script). filter-text first; the memo extrapolates. +- Changing thresholds in `scripts/bench-matrix.mjs`. Status verdicts unchanged. +- Touching public-API surface. Any fix targets internal grid-core / react implementation. +- Tier 1 backlog (theming, AI integrations, docs). + +## Architecture + +Three sequential phases inside one PR. A conditional fourth phase (Phase D) handles the fix-if-easy decision. + +### Phase A — Trace capture + +Capture one Playwright trace for `pretable / S2 / hypothesis / filter-text`. The matrix runner can be coerced into capturing traces via its existing test harness; alternatively, run the Playwright spec directly with `PRETABLE_BENCH_*` env vars and `PLAYWRIGHT_TRACE=on`. + +``` +PRETABLE_BENCH_ADAPTER=pretable \ +PRETABLE_BENCH_SCENARIO=S2 \ +PRETABLE_BENCH_SCALE=hypothesis \ +PRETABLE_BENCH_SCRIPT=filter-text \ +PLAYWRIGHT_TRACE=on \ +pnpm --filter @pretable/app-bench exec playwright test --workers=1 +``` + +Trace file location: `apps/bench/test-results/*.zip` (or wherever the playwright config writes them). Copy + rename to `status/traces/2026-05-13-pretable-filter-text-perf.trace.zip`. + +**Size budget:** 5–25 MB. If >25 MB, see "Trace size handling" below. + +### Phase B — Trace analysis + +Open via `pnpm exec playwright show-trace status/traces/2026-05-13-pretable-filter-text-perf.trace.zip`. The Performance / Trace view shows the steady-state interaction window. + +Trigger-to-first-changed-frame window: + +1. Click on filter input (or programmatic event dispatch) — `trigger()` callback in `measureBenchInteractionRun`. +2. React state update → re-render. +3. `deriveVisibleRows` runs → produces new filtered+sorted row list. +4. Virtualized list re-renders the visible window with the new list. +5. DOM mutations → paint → first changed frame. + +For each long scripting task in the window: + +- Note the duration (ms). +- Note the top-of-stack function name. +- Note which phase it's in (derivation / reconciliation / virtualization / DOM update). + +The differential vs `scroll` (9.07 ms p95, well under budget) is the key: scroll doesn't run `deriveVisibleRows` on every frame, only on initial render. Interaction triggers a full re-derive. That's the likely difference. + +### Phase C — Memo + +`docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md`: + +``` +# Pretable wrapped-text filter perf diagnostic — 2026-05-13 + +## Summary +<1-2 sentences identifying the leading hypothesis> + +## Context +- PR #134's n=20 verdicts (cite numbers). +- Code structure summary (deriveVisibleRows + React render). + +## Method +- Trace captured at `status/traces/...`. +- Analysis: which scripting tasks dominate the trigger-to-first-frame window. + +## Trace findings +- Longest scripting tasks with durations + function names + phase classification. +- Specific call-stack snippets for the hotspot(s). + +## Hypothesis for the gap +- Leading cause (e.g., "deriveVisibleRows takes 8 ms; the rest is React reconciliation and virtualization"). +- Why the other two scripts (sort, filter-metadata) likely share this cause. + +## Proposed fixes +- One row per candidate: description, expected delta, risk, complexity. + +## Verdict +- "Fix shipped in this PR — <X>" / "Fix deferred to follow-up PR — <reasoning>". +``` + +### Phase D (conditional) — Ship a fix + +Ship the fix only if ALL of these hold: + +1. **One obvious code change.** Single function or single file. No cross-cutting refactor. +2. **Doesn't touch quality wedge.** Zero blank gaps, anchor stability, ≤1 px row-height fidelity must all stay intact. +3. **Re-run matrix confirms the delta.** Pretable filter-text n=20 mean must move at least 2 ms lower with the fix in place, AND comparators' relative ratios stay 2-3× away from pretable. +4. **No public-API change.** Internal-only edit. + +If any of these fail, the memo lists the candidate but the fix is queued to a separate PR. Document the deferral in Phase C's "Verdict" section. + +### Trace size handling + +If the trace exceeds 25 MB: + +- View locally with `playwright show-trace`. +- Capture screenshots of the Performance panel (steady-state scroll window). +- Save under `docs/research/2026-05-13-perf-diag-traces/`. +- Note the local trace path in the memo without committing the binary. + +### Out-of-scope follow-ups + +- **Same-cause re-profile for sort + filter-metadata.** If filter-text's hotspot is `deriveVisibleRows`, the other two scripts should share the cause. The memo notes this; a fix PR can verify with focused matrix runs. +- **Stochastic-variance investigation** for filter-metadata. PR #134 noted σ = 2.44 ms — much wider than filter-text's 0.31 ms. Could be initial-scroll-position dependence or row-selection state. Logged as a follow-up. +- **`/bench` page hypothetical-fix-applied projection.** Out of scope; informational only. + +## Risks + +- **Trace doesn't surface a single dominant cause.** If the budget is consumed by many small contributions (React reconciliation across many cells, virtualization overhead, DOM mutations), there's no single "fix it here" answer. The memo records this honestly: verdict becomes "real-but-distributed; no single-cause hotspot; over-budget reflects 60Hz frame cost of full filter-rederive at this dataset size." +- **Matrix-runner flake during Phase D verification.** Already well-documented; document actual sample count. +- **Quality regression on a fix.** Phase D's gate condition #2 is explicit: any quality regression kills the fix. Verifying requires re-running the matrix's scroll + interaction scripts and confirming `blank_gap_frames`, `scroll_anchor_shift_backward_p95_px`, `row_height_error_p95_px` stay at their existing values. +- **Public-API leak.** Phase D restricts to internal-only edits. If a fix would need to expose a new option (`autoBatchFilters: true`, etc.), the memo proposes it and defers to a follow-up PR for the design conversation. + +## Test plan + +- **Pre-fix:** matrix n=20 numbers as a baseline (already captured in PR #134 + PR #141; no new run needed unless the trace capture goes through the matrix path). +- **Trace analysis:** manual review; no automated check. +- **Post-fix (if Phase D fires):** re-run pretable-only matrix at n=20 for filter-text + (sort, filter-metadata) to confirm latency delta. Confirm `blank_gap_frames`, `scroll_anchor_shift_backward_p95_px`, `row_height_error_p95_px` unchanged. +- **Repo-wide:** `pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format` clean.