diff --git a/docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md b/docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md new file mode 100644 index 0000000..1377172 --- /dev/null +++ b/docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md @@ -0,0 +1,67 @@ +# Pretable scroll-with-render perf diagnostic — 2026-05-11 + +## Summary + +The PR #130 cheap-render anomaly is sampling noise. A high-repeat re-run shows pretable's `scroll-with-render` is at parity with (in fact marginally faster than) `scroll-with-format` and `scroll-with-heavy-render` on `scroll_frame_p95_ms`. No perf-fix PR needed; this investigation ends here. + +## Context + +PR #130 captured (n=3 medians, Chromium S2/hypothesis): + +| Script | scroll p95 | +| -------------------------- | ----------- | +| `scroll-with-format` | 10.2 ms | +| `scroll-with-render` | **16.4 ms** | +| `scroll-with-heavy-render` | 10.3 ms | + +Cheap-render renders fewer DOM nodes than heavy-render (164 vs 296) yet measured ~6 ms slower than both heavy-render and format. Two competing hypotheses motivated this investigation: + +1. **Sampling noise.** Most likely outcome based on PR #124's precedent, where a 1 ms gap dissolved at n=20. +2. **React-reconciliation cliff.** Less likely but plausible — single-text-child span vs multi-child span hitting different reconciliation paths. + +## Method + +- Matrix: `pnpm bench:matrix --project=chromium --adapters=pretable --scenarios=S2 --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render --scale=hypothesis --repeats=20`. +- Hardware: MacBook Pro, Apple M-series, local laptop environment. +- Background load: typical local desktop conditions; no priority pinning or load-control applied. +- Statistical test: 2σ on mean `scroll_frame_p95_ms`. Gap is "real" only when BOTH `|mean_cheap − mean_format|` AND `|mean_cheap − mean_heavy|` exceed `2 × max(σ_cheap, σ_other)`. + +## High-repeat data + +The matrix run completed only ~36 % of planned repeats before exiting (likely a Playwright flake; not investigated since the verdict was clear at the data available): + +| Script | n | mean p95 (ms) | σ (ms) | min | median | max | +| ------------------------ | --- | ------------- | ------ | --- | ------ | ---- | +| scroll-with-format | 8 | 9.36 | 0.80 | 8.6 | 9.2 | 11.4 | +| scroll-with-render | 7 | **8.97** | 0.35 | 8.4 | 9.1 | 9.4 | +| scroll-with-heavy-render | 6 | 9.15 | 0.13 | 8.9 | 9.2 | 9.3 | + +Source: `status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json`. + +**Sample size note.** With n=6–8 per script and σ ≈ 0.13–0.80 ms, the 95 % confidence interval for each mean is approximately ±0.30 ms — tight enough that the PR #130 cheap-render value of 16.4 ms is statistically impossible (≈ 21 σ away from the observed mean). The matrix's partial run is not a problem for this verdict; it would only matter if we needed to settle a sub-millisecond gap. + +## Statistical verdict + +- **cheap vs format:** mean diff = −0.39 ms (cheap-render is faster); 2σ noise floor = 1.60 ms. **Within noise.** +- **cheap vs heavy:** mean diff = −0.18 ms (cheap-render is faster); 2σ noise floor = 0.70 ms. **Within noise.** + +Overall verdict: **noise.** Cheap-render is in fact marginally faster than both format and heavy-render at higher repeats; the PR #130 6 ms gap was a sampling artifact at n=3. + +## Interpretation + +The PR #130 cheap-render outlier (16.4 ms) was very likely a single bad frame caught by p95-of-3-samples. p95 of n=3 is effectively max-of-3; one unfortunate frame in the cheap-render's scroll script can dominate the median across three repeats and produce a wholly misleading number. PR #124 documented the same dynamic with a 1 ms gap that dissolved at n=20; this is the same pattern at larger magnitude. + +Heavy-render at n=6 has σ = 0.13 ms, which is unusually tight — likely an artifact of the small sample. Format at n=8 has σ = 0.80 ms, which is closer to the typical run-to-run variance for `scroll_frame_p95_ms` at hypothesis scale. + +The most important finding: **no React-reconciliation cliff exists between single-text-child spans and multi-child spans in pretable's `MemoizedCellContent` path.** The three flavors are all in the same neighborhood of ~9 ms p95, well under the 16 ms single-frame budget. + +## Verdict + +Gap is noise; the PR #130 cheap-render outlier was a sample artifact at n=3. **No perf-fix PR needed.** + +### Recommendations + +1. The `scroll-with-render` script is sound — no underlying perf issue in pretable's cell-render path. +2. Updating the homepage's interaction-row narrative is not warranted; the cell-renderer scripts already aren't on the homepage's ComparisonTable. +3. If anyone re-runs the cell-renderer matrix and sees similar anomalies, default to the same protocol: high-repeat re-run before profiling. The pattern from PR #124 + this memo is now well-established — small p95 gaps at n=3 are almost always noise. +4. Bench-matrix sample protocol could default to higher repeats for hypothesis-scale runs, but this is a tradeoff between wall-clock time and statistical confidence; not actioned here. diff --git a/docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md b/docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md new file mode 100644 index 0000000..0dfe8e0 --- /dev/null +++ b/docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md @@ -0,0 +1,474 @@ +# Pretable `scroll-with-render` Perf Diagnostic Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Diagnose whether pretable's `scroll-with-render` 6 ms gap (vs format / heavy-render) from PR #130 is real or a low-sample artifact. Mirror PR #124's three-phase pattern. Output is a research memo + raw evidence — no code changes. + +**Architecture:** Per the spec at `docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md`. Single PR on `pretable-scroll-with-render-perf-diag`; auto-merge if memo verdict is "noise"; hold for review if "real, hypothesis: X". + +**Tech Stack:** Existing matrix runner. No new dependencies. + +**Spec:** [`docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md`](../specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md) + +**Working directory:** `/Users/blove/repos/pretable/.worktrees/pretable-scroll-with-render-perf-diag`. + +--- + +## File Structure + +``` +status/milestones/ +└── 2026-05-11-pretable-cell-renderer-high-repeat.json (NEW Phase A) + +status/traces/ (NEW only if Phase B fires) +├── 2026-05-11-pretable-scroll-with-format.trace.zip +├── 2026-05-11-pretable-scroll-with-render.trace.zip +└── 2026-05-11-pretable-scroll-with-heavy-render.trace.zip + +docs/research/ +└── 2026-05-11-pretable-scroll-with-render-perf-diagnostic.md (NEW Phase C — the memo) +``` + +No source code, package, or test files modified. + +--- + +## Pre-flight + +- [ ] **0.1** Confirm PR #130 and #131 are merged to main (they should be; the worktree is off `origin/main`). +- [ ] **0.2** Confirm machine is idle for bench fairness. Quit anything heavy. Document any unavoidable background load in the memo. +- [ ] **0.3** Build the harness: + ``` + pnpm --filter @pretable/app-bench build + ``` + +--- + +## Phase A — High-repeat re-run + +### Task 1 — Run the matrix + +- [ ] **1.1** Run the matrix at n=20: + + ``` + pnpm bench:matrix \ + --project=chromium \ + --adapters=pretable \ + --scenarios=S2 \ + --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render \ + --scale=hypothesis \ + --repeats=20 + ``` + + Expected wall-clock: 8–12 min. Output per-run summaries land under `status/`; the matrix runner writes a runset under `status/runsets//`. + +- [ ] **1.2** Note the runset id from the output. + +- [ ] **1.3** Locate the per-run summary files: + + ``` + ls status/chromium-pretable-default-s2-hypothesis-scroll-with-*-2026-05-11*.summary.json | wc -l + ``` + + Expected: 60 files (3 scripts × 20 repeats). + +### Task 2 — Aggregate stats + +- [ ] **2.1** Build the aggregator + write the milestone file using a one-shot Node script. Use `--input-type=module`: + + ```bash + node --input-type=module <<'EOF' + import { readdir, readFile, writeFile } from "node:fs/promises"; + import { join } from "node:path"; + + const SCRIPTS = ["scroll-with-format", "scroll-with-render", "scroll-with-heavy-render"]; + const STATUS_DIR = "status"; + const OUT_PATH = "status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json"; + const files = await readdir(STATUS_DIR); + + function stats(xs) { + const n = xs.length; + if (n === 0) return { n: 0 }; + const mean = xs.reduce((a, b) => a + b, 0) / n; + const variance = xs.reduce((a, b) => a + (b - mean) ** 2, 0) / n; + const sd = Math.sqrt(variance); + const sorted = [...xs].sort((a, b) => a - b); + const median = n % 2 ? sorted[(n - 1) / 2] : (sorted[n / 2 - 1] + sorted[n / 2]) / 2; + return { + n, + mean: +mean.toFixed(3), + sd: +sd.toFixed(3), + min: Math.min(...xs), + median, + max: Math.max(...xs), + samples: xs, + }; + } + + const perScript = []; + for (const script of SCRIPTS) { + const matching = files.filter( + (f) => + f.startsWith(`chromium-pretable-default-s2-hypothesis-${script}-2026-05-11`) && + f.endsWith(".summary.json"), + ); + const samples = []; + for (const f of matching) { + const data = JSON.parse(await readFile(join(STATUS_DIR, f), "utf8")); + const p95 = data.metrics?.scroll_frame_p95_ms; + if (typeof p95 === "number" && Number.isFinite(p95)) samples.push(p95); + } + perScript.push({ scriptName: script, metric: "scroll_frame_p95_ms", ...stats(samples) }); + } + + // 2σ verdict — gap is real only when BOTH pairs pass (cheap vs format AND cheap vs heavy). + function passes(left, right) { + if (left.n === 0 || right.n === 0) return false; + const gap = Math.abs(left.mean - right.mean); + const noise = 2 * Math.max(left.sd, right.sd); + return gap > noise; + } + + const cheap = perScript.find((s) => s.scriptName === "scroll-with-render"); + const format = perScript.find((s) => s.scriptName === "scroll-with-format"); + const heavy = perScript.find((s) => s.scriptName === "scroll-with-heavy-render"); + + const cheapVsFormat = { + meanDiff: +(cheap.mean - format.mean).toFixed(3), + noiseFloor: +(2 * Math.max(cheap.sd, format.sd)).toFixed(3), + real: passes(cheap, format), + }; + const cheapVsHeavy = { + meanDiff: +(cheap.mean - heavy.mean).toFixed(3), + noiseFloor: +(2 * Math.max(cheap.sd, heavy.sd)).toFixed(3), + real: passes(cheap, heavy), + }; + + const verdict = + cheapVsFormat.real && cheapVsHeavy.real + ? "real-cheap-render-slower" + : !cheapVsFormat.real && !cheapVsHeavy.real + ? "noise" + : "mixed"; + + const out = { + generatedAt: new Date().toISOString(), + adapterId: "pretable", + scenarioId: "S2", + scale: "hypothesis", + browserName: "chromium", + repeats: 20, + perScript, + twoSigmaTest: { + rule: "real if |mean_cheap - mean_other| > 2 * max(sd_cheap, sd_other) for both pairs", + cheapVsFormat, + cheapVsHeavy, + }, + verdict, + }; + + await writeFile(OUT_PATH, JSON.stringify(out, null, 2) + "\n"); + console.log(`Wrote ${OUT_PATH}`); + console.log(JSON.stringify(out, null, 2)); + EOF + ``` + +- [ ] **2.2** Inspect the output. Verify the JSON has all three scripts with n=20 finite means + σ; verdict is one of `noise`, `mixed`, or `real-cheap-render-slower`. + +- [ ] **2.3** Commit: + + ``` + git add status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json + git commit -m "chore(bench): high-repeat pretable cell-renderer milestone for scroll-with-render perf diagnostic" + ``` + +### Task 3 — Branch on verdict + +- [ ] **3.1** Read the verdict: + + ``` + jq -r '.verdict' status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json + ``` + + - **If `noise`** → skip Phase B (Tasks 4+5). Go to Task 6 (write a short negative-result memo). + - **If `mixed`** → skip Phase B. Memo concludes "differential isn't clean; one pair within noise" with recommendations. + - **If `real-cheap-render-slower`** → proceed to Phase B. + + Document the verdict you got so Task 6 can write the right memo branch. + +--- + +## Phase B — Trace capture (conditional) + +**Skip this entire phase if Task 3.1 returned `noise` or `mixed`.** + +### Task 4 — Capture traces + +- [ ] **4.1** Capture one Playwright trace per script using the existing bench harness. Re-use the matrix runner with `--repeats=1` per script (one fresh run per script): + + ``` + pnpm bench:matrix \ + --project=chromium \ + --adapters=pretable \ + --scenarios=S2 \ + --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render \ + --scale=hypothesis \ + --repeats=1 + ``` + + Trace files are written under `status/traces/` per the bench harness's existing wiring. + +- [ ] **4.2** Identify the three newest traces: + + ``` + ls -lt status/traces/chromium-pretable-default-s2-hypothesis-scroll-with-*-2026-05-11*.trace.zip | head -3 + ``` + + Copy + rename them to the spec's paths: + + ``` + cp status/traces/2026-05-11-pretable-scroll-with-format.trace.zip + cp status/traces/2026-05-11-pretable-scroll-with-render.trace.zip + cp status/traces/2026-05-11-pretable-scroll-with-heavy-render.trace.zip + ``` + +- [ ] **4.3** Verify trace zips are openable (size > 0, valid format): + + ``` + ls -lh status/traces/2026-05-11-pretable-scroll-with-*.trace.zip + ``` + + Sizes should be 2–25 MB each. If any is >25 MB, see step 4.5. + +- [ ] **4.4** Verify each trace opens cleanly via: + + ``` + pnpm exec playwright show-trace status/traces/2026-05-11-pretable-scroll-with-render.trace.zip + ``` + + Quit the viewer (`q` in spawning terminal). Repeat for format + heavy-render. If any fails to open, STOP and report BLOCKED. + +- [ ] **4.5** If a trace is >25 MB: + - View it with `pnpm exec playwright show-trace`. + - Screenshot the Performance panel showing the steady-state scroll phase. + - Save the screenshot under `docs/research/2026-05-11-perf-diag-traces/