From b9cefa302b53a3463301f71d219e036043716ad2 Mon Sep 17 00:00:00 2001 From: Brian Love Date: Mon, 11 May 2026 13:27:20 -0700 Subject: [PATCH 1/4] docs(specs): pretable scroll-with-render perf diagnostic design Three-phase research PR mirroring PR #124's pattern: high-repeat (n=20) re-run, conditional Playwright trace capture, research memo. Diagnoses whether the PR #130 cheap-render anomaly (16.4 ms vs 10.3 ms for format and heavy-render) is real or a low-sample artifact. Co-Authored-By: Claude Opus 4.7 --- ...roll-with-render-perf-diagnostic-design.md | 139 ++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md diff --git a/docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md b/docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md new file mode 100644 index 0000000..3b7adbd --- /dev/null +++ b/docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md @@ -0,0 +1,139 @@ +# Pretable `scroll-with-render` Perf Diagnostic Design + +**Date:** 2026-05-11 +**Status:** Draft (awaiting user review before plan) +**Predecessor:** [B2 follow-up #5a cell-renderer comparators (PR #130)](../../research/repo-memory.md); [B2 follow-up #1 perf-diag (PR #124)](./2026-05-09-b2-followup-perf-diagnostic-design.md) — same pattern reapplied + +--- + +## Goal + +Diagnose whether pretable's `scroll-with-render` frame p95 (16.4 ms in the n=3 PR #130 runset) is genuinely slower than `scroll-with-format` (10.2 ms) and `scroll-with-heavy-render` (10.3 ms), or whether the 6 ms gap is a low-sample artifact. Output is a research memo plus committed evidence — no code changes. If the gap is real, the memo proposes the cause and a future fix PR; if it's noise, the memo concludes that. + +## Why + +PR #130 captured (n=3 medians, Chromium S2/hypothesis): + +| Script | pretable scroll p95 | DOM nodes peak | Notes | +| --- | --- | --- | --- | +| `scroll-with-format` | 10.2 ms | 98 | column.format set | +| `scroll-with-render` | **16.4 ms** | 164 | column.render set (cheap JSX: 1 span) | +| `scroll-with-heavy-render` | 10.3 ms | 296 | column.render set (heavy JSX: 3 spans + className) | + +Heavy render renders more DOM (296 nodes vs 164) yet measures faster than cheap render. Heavy and cheap share the same code path in `@pretable/react`'s `MemoizedCellContent` (the `column.render` branch); the only structural difference between them is the JSX shape returned. Format diverges (uses `column.format` instead). + +Two competing hypotheses: + +1. **Sampling noise.** n=3 means p95 is essentially max-of-3. A single slow frame in the cheap-render run could inflate the median across 3 repeats. Most likely outcome based on the PR #124 precedent (where a 1 ms gap dissolved at n=20). +2. **React-reconciliation cliff.** A single-text-child span (cheap) vs a multi-child span (heavy) hits different React reconciliation paths. Less likely but plausible — would manifest as a real and reproducible gap at higher repeats. + +This investigation tightens the signal first, then profiles only if warranted. + +## Non-goals + +- **Fixing the gap.** Any code change to close the cheap-render path is a follow-up PR informed by this memo's verdict. +- **Compromising pretable's quality guarantees:** zero blank-gap frames, zero scroll-anchor backward shift, ≤1 px row-height error. Proposed fixes that erode these get marked "not worth it" regardless of speed gain. +- **Cross-browser data.** Chromium only, mirroring PR #130 / #124. +- **Other adapters.** This is pretable-internal; ag-grid / tanstack / mui cell-renderer numbers are not in scope. +- **A synthetic micro-benchmark.** The existing `apps/bench` matrix is the instrument. + +## Architecture + +One PR off latest `main`. Three sequential phases inside the PR (mirrors PR #124): + +| Phase | Action | Output | +|---|---|---| +| A | High-repeat (n=20) re-run of `pretable / S2 / hypothesis / {scroll-with-format, scroll-with-render, scroll-with-heavy-render}` on Chromium. | `status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json` with mean / σ / min / median / max per script. | +| B | If Phase A confirms both `cheap − format` AND `cheap − heavy` gaps are real (>2σ): capture one Playwright trace per script. | `status/traces/2026-05-11-pretable-scroll-with-{format,render,heavy-render}.trace.zip` | +| C | Manual trace analysis (only if Phase B fires). Identify what cheap-render does differently. | `docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md` | + +If Phase A shows the gap is noise (one or both pairs within 2σ), the memo concludes "noise; the n=3 cheap-render outlier was a sample artifact" and skips Phases B+C. The PR ships the high-repeat runset + a short negative-result memo. + +## Method details + +### Phase A + +``` +pnpm bench:matrix \ + --project=chromium \ + --adapters=pretable \ + --scenarios=S2 \ + --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render \ + --scale=hypothesis \ + --repeats=20 +``` + +3 scripts × 20 repeats = 60 runs. Wall-clock ≈ 8–12 min. + +**Statistical test.** Compute mean and standard deviation of `scroll_frame_p95_ms` across the 20 samples per script. The gap is "real" if BOTH of these hold: + +- `|mean_cheap − mean_format| > 2 × max(σ_cheap, σ_format)` +- `|mean_cheap − mean_heavy| > 2 × max(σ_cheap, σ_heavy)` + +If only one pair shows "real" or both show noise, the differential isn't clean and Phase A is the verdict. If both pass, Phase B fires. + +**Output file.** `status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json` mirrors the shape of `2026-05-09-perf-diag-high-repeat.scroll.json` (PR #124's output) — one entry per script, sample array preserved for future re-analysis. + +### Phase B (conditional) + +Capture one Playwright trace per script in dev mode. The bench-app's existing trace wiring writes `.trace.zip` under `status/traces/` per run. Save the three traces (or fewer if file size is prohibitive — commit a flame-graph summary instead). + +If trace capture fails (file format issues, harness errors), escalate BLOCKED. The memo can still ship with verdict "real but undiagnosed." + +### Phase C (conditional) + +Manual analysis. The trace viewer (Chrome DevTools or Playwright's `show-trace`) shows the steady-state scroll frame work. Look for: + +- A code-path divergence between cheap and format (different React render branches, different reconciliation cost) +- A code-path divergence between cheap and heavy (the most surprising — same React code path, different JSX shape) +- Style recalculation or layout work attributable to the cheap-render's DOM shape + +Memo records findings, leading hypothesis, and proposed fix(es) without implementing. + +### Memo structure + +`docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md`: + +``` +# Pretable scroll-with-render perf diagnostic — 2026-05-11 + +## Summary +<1-2 sentences: gap real or noise; leading hypothesis if real> + +## High-repeat data (n=20) + + +## Statistical verdict +<2σ comparisons shown; "real" only if both pairs pass> + +## Trace findings (only if real) +- Cheap render hotspots: +- Heavy render hotspots: +- Format hotspots: +- Differential: + +## Hypothesis for the gap (only if real) +<1-3 paragraphs> + +## Proposed fixes (no code in this PR) + + +## Verdict + +``` + +Length target: 500–1500 words. Same as PR #124's memo. + +## Risks + +- **Variance at hypothesis scale.** The cell-renderer scripts are scroll-shape and run for ~3 seconds. Per-frame timing variance can be 1-3 ms under normal local conditions. Even n=20 might not collapse the gap to a clean verdict if σ_cheap is high. Mitigation: report σ honestly and recommend n=50 follow-up if inconclusive. +- **Hardware noise.** Local Chromium on a busy machine produces noisier perf than a lab environment. Run with the laptop idle; document any unavoidable background load in the memo. +- **Cheap-render gap may be a sampling-noise + DOM-attribute interaction.** The `data-bench-render="cheap"` attribute is set on cheap; `data-bench-render="heavy"` plus more is set on heavy. If Chrome's style-invalidation cost differs for attribute count or className presence, the gap could be real but attributable to bench-instrumentation rather than pretable's render path. The memo should note this potential confound. +- **Manual trace analysis is bespoke.** No automated test confirms a memo's findings. Mitigation: commit the traces so a future reader can re-open them. + +## Out of scope + +- Code fixes: follow-up PR informed by the memo. +- Other browsers (Webkit, Firefox). +- Other scripts or scenarios. +- Updating the `/bench` page or homepage. This memo informs decisions; presentation changes are separate. From f0162d3116927c8002c81975583cee3262653301 Mon Sep 17 00:00:00 2001 From: Brian Love Date: Mon, 11 May 2026 13:29:36 -0700 Subject: [PATCH 2/4] docs(plans): pretable scroll-with-render perf diagnostic plan Seven-task plan mirroring PR #124's three-phase pattern: n=20 matrix re-run, conditional Playwright trace capture, research memo. Co-Authored-By: Claude Opus 4.7 --- ...able-scroll-with-render-perf-diagnostic.md | 469 ++++++++++++++++++ 1 file changed, 469 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md diff --git a/docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md b/docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md new file mode 100644 index 0000000..c7f20c7 --- /dev/null +++ b/docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md @@ -0,0 +1,469 @@ +# Pretable `scroll-with-render` Perf Diagnostic Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Diagnose whether pretable's `scroll-with-render` 6 ms gap (vs format / heavy-render) from PR #130 is real or a low-sample artifact. Mirror PR #124's three-phase pattern. Output is a research memo + raw evidence — no code changes. + +**Architecture:** Per the spec at `docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md`. Single PR on `pretable-scroll-with-render-perf-diag`; auto-merge if memo verdict is "noise"; hold for review if "real, hypothesis: X". + +**Tech Stack:** Existing matrix runner. No new dependencies. + +**Spec:** [`docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md`](../specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md) + +**Working directory:** `/Users/blove/repos/pretable/.worktrees/pretable-scroll-with-render-perf-diag`. + +--- + +## File Structure + +``` +status/milestones/ +└── 2026-05-11-pretable-cell-renderer-high-repeat.json (NEW Phase A) + +status/traces/ (NEW only if Phase B fires) +├── 2026-05-11-pretable-scroll-with-format.trace.zip +├── 2026-05-11-pretable-scroll-with-render.trace.zip +└── 2026-05-11-pretable-scroll-with-heavy-render.trace.zip + +docs/research/ +└── 2026-05-11-pretable-scroll-with-render-perf-diagnostic.md (NEW Phase C — the memo) +``` + +No source code, package, or test files modified. + +--- + +## Pre-flight + +- [ ] **0.1** Confirm PR #130 and #131 are merged to main (they should be; the worktree is off `origin/main`). +- [ ] **0.2** Confirm machine is idle for bench fairness. Quit anything heavy. Document any unavoidable background load in the memo. +- [ ] **0.3** Build the harness: + ``` + pnpm --filter @pretable/app-bench build + ``` + +--- + +## Phase A — High-repeat re-run + +### Task 1 — Run the matrix + +- [ ] **1.1** Run the matrix at n=20: + + ``` + pnpm bench:matrix \ + --project=chromium \ + --adapters=pretable \ + --scenarios=S2 \ + --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render \ + --scale=hypothesis \ + --repeats=20 + ``` + + Expected wall-clock: 8–12 min. Output per-run summaries land under `status/`; the matrix runner writes a runset under `status/runsets//`. + +- [ ] **1.2** Note the runset id from the output. + +- [ ] **1.3** Locate the per-run summary files: + + ``` + ls status/chromium-pretable-default-s2-hypothesis-scroll-with-*-2026-05-11*.summary.json | wc -l + ``` + + Expected: 60 files (3 scripts × 20 repeats). + +### Task 2 — Aggregate stats + +- [ ] **2.1** Build the aggregator + write the milestone file using a one-shot Node script. Use `--input-type=module`: + + ```bash + node --input-type=module <<'EOF' + import { readdir, readFile, writeFile } from "node:fs/promises"; + import { join } from "node:path"; + + const SCRIPTS = ["scroll-with-format", "scroll-with-render", "scroll-with-heavy-render"]; + const STATUS_DIR = "status"; + const OUT_PATH = "status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json"; + const files = await readdir(STATUS_DIR); + + function stats(xs) { + const n = xs.length; + if (n === 0) return { n: 0 }; + const mean = xs.reduce((a, b) => a + b, 0) / n; + const variance = xs.reduce((a, b) => a + (b - mean) ** 2, 0) / n; + const sd = Math.sqrt(variance); + const sorted = [...xs].sort((a, b) => a - b); + const median = n % 2 ? sorted[(n - 1) / 2] : (sorted[n / 2 - 1] + sorted[n / 2]) / 2; + return { + n, + mean: +mean.toFixed(3), + sd: +sd.toFixed(3), + min: Math.min(...xs), + median, + max: Math.max(...xs), + samples: xs, + }; + } + + const perScript = []; + for (const script of SCRIPTS) { + const matching = files.filter( + (f) => + f.startsWith(`chromium-pretable-default-s2-hypothesis-${script}-2026-05-11`) && + f.endsWith(".summary.json"), + ); + const samples = []; + for (const f of matching) { + const data = JSON.parse(await readFile(join(STATUS_DIR, f), "utf8")); + const p95 = data.metrics?.scroll_frame_p95_ms; + if (typeof p95 === "number" && Number.isFinite(p95)) samples.push(p95); + } + perScript.push({ scriptName: script, metric: "scroll_frame_p95_ms", ...stats(samples) }); + } + + // 2σ verdict — gap is real only when BOTH pairs pass (cheap vs format AND cheap vs heavy). + function passes(left, right) { + if (left.n === 0 || right.n === 0) return false; + const gap = Math.abs(left.mean - right.mean); + const noise = 2 * Math.max(left.sd, right.sd); + return gap > noise; + } + + const cheap = perScript.find((s) => s.scriptName === "scroll-with-render"); + const format = perScript.find((s) => s.scriptName === "scroll-with-format"); + const heavy = perScript.find((s) => s.scriptName === "scroll-with-heavy-render"); + + const cheapVsFormat = { + meanDiff: +(cheap.mean - format.mean).toFixed(3), + noiseFloor: +(2 * Math.max(cheap.sd, format.sd)).toFixed(3), + real: passes(cheap, format), + }; + const cheapVsHeavy = { + meanDiff: +(cheap.mean - heavy.mean).toFixed(3), + noiseFloor: +(2 * Math.max(cheap.sd, heavy.sd)).toFixed(3), + real: passes(cheap, heavy), + }; + + const verdict = + cheapVsFormat.real && cheapVsHeavy.real + ? "real-cheap-render-slower" + : !cheapVsFormat.real && !cheapVsHeavy.real + ? "noise" + : "mixed"; + + const out = { + generatedAt: new Date().toISOString(), + adapterId: "pretable", + scenarioId: "S2", + scale: "hypothesis", + browserName: "chromium", + repeats: 20, + perScript, + twoSigmaTest: { + rule: "real if |mean_cheap - mean_other| > 2 * max(sd_cheap, sd_other) for both pairs", + cheapVsFormat, + cheapVsHeavy, + }, + verdict, + }; + + await writeFile(OUT_PATH, JSON.stringify(out, null, 2) + "\n"); + console.log(`Wrote ${OUT_PATH}`); + console.log(JSON.stringify(out, null, 2)); + EOF + ``` + +- [ ] **2.2** Inspect the output. Verify the JSON has all three scripts with n=20 finite means + σ; verdict is one of `noise`, `mixed`, or `real-cheap-render-slower`. + +- [ ] **2.3** Commit: + + ``` + git add status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json + git commit -m "chore(bench): high-repeat pretable cell-renderer milestone for scroll-with-render perf diagnostic" + ``` + +### Task 3 — Branch on verdict + +- [ ] **3.1** Read the verdict: + + ``` + jq -r '.verdict' status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json + ``` + + - **If `noise`** → skip Phase B (Tasks 4+5). Go to Task 6 (write a short negative-result memo). + - **If `mixed`** → skip Phase B. Memo concludes "differential isn't clean; one pair within noise" with recommendations. + - **If `real-cheap-render-slower`** → proceed to Phase B. + + Document the verdict you got so Task 6 can write the right memo branch. + +--- + +## Phase B — Trace capture (conditional) + +**Skip this entire phase if Task 3.1 returned `noise` or `mixed`.** + +### Task 4 — Capture traces + +- [ ] **4.1** Capture one Playwright trace per script using the existing bench harness. Re-use the matrix runner with `--repeats=1` per script (one fresh run per script): + + ``` + pnpm bench:matrix \ + --project=chromium \ + --adapters=pretable \ + --scenarios=S2 \ + --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render \ + --scale=hypothesis \ + --repeats=1 + ``` + + Trace files are written under `status/traces/` per the bench harness's existing wiring. + +- [ ] **4.2** Identify the three newest traces: + + ``` + ls -lt status/traces/chromium-pretable-default-s2-hypothesis-scroll-with-*-2026-05-11*.trace.zip | head -3 + ``` + + Copy + rename them to the spec's paths: + + ``` + cp status/traces/2026-05-11-pretable-scroll-with-format.trace.zip + cp status/traces/2026-05-11-pretable-scroll-with-render.trace.zip + cp status/traces/2026-05-11-pretable-scroll-with-heavy-render.trace.zip + ``` + +- [ ] **4.3** Verify trace zips are openable (size > 0, valid format): + + ``` + ls -lh status/traces/2026-05-11-pretable-scroll-with-*.trace.zip + ``` + + Sizes should be 2–25 MB each. If any is >25 MB, see step 4.5. + +- [ ] **4.4** Verify each trace opens cleanly via: + + ``` + pnpm exec playwright show-trace status/traces/2026-05-11-pretable-scroll-with-render.trace.zip + ``` + + Quit the viewer (`q` in spawning terminal). Repeat for format + heavy-render. If any fails to open, STOP and report BLOCKED. + +- [ ] **4.5** If a trace is >25 MB: + - View it with `pnpm exec playwright show-trace`. + - Screenshot the Performance panel showing the steady-state scroll phase. + - Save the screenshot under `docs/research/2026-05-11-perf-diag-traces/