diff --git a/docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md b/docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md
new file mode 100644
index 0000000..1377172
--- /dev/null
+++ b/docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md
@@ -0,0 +1,67 @@
+# Pretable scroll-with-render perf diagnostic — 2026-05-11
+
+## Summary
+
+The PR #130 cheap-render anomaly is sampling noise. A high-repeat re-run shows pretable's `scroll-with-render` is at parity with (in fact marginally faster than) `scroll-with-format` and `scroll-with-heavy-render` on `scroll_frame_p95_ms`. No perf-fix PR needed; this investigation ends here.
+
+## Context
+
+PR #130 captured (n=3 medians, Chromium S2/hypothesis):
+
+| Script                     | scroll p95  |
+| -------------------------- | ----------- |
+| `scroll-with-format`       | 10.2 ms     |
+| `scroll-with-render`       | **16.4 ms** |
+| `scroll-with-heavy-render` | 10.3 ms     |
+
+Cheap-render renders fewer DOM nodes than heavy-render (164 vs 296) yet measured ~6 ms slower than both heavy-render and format. Two competing hypotheses motivated this investigation:
+
+1. **Sampling noise.** Most likely outcome based on PR #124's precedent, where a 1 ms gap dissolved at n=20.
+2. **React-reconciliation cliff.** Less likely but plausible — single-text-child span vs multi-child span hitting different reconciliation paths.
+
+## Method
+
+- Matrix: `pnpm bench:matrix --project=chromium --adapters=pretable --scenarios=S2 --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render --scale=hypothesis --repeats=20`.
+- Hardware: MacBook Pro, Apple M-series, local laptop environment.
+- Background load: typical local desktop conditions; no priority pinning or load-control applied.
+- Statistical test: 2σ on mean `scroll_frame_p95_ms`. Gap is "real" only when BOTH `|mean_cheap − mean_format|` AND `|mean_cheap − mean_heavy|` exceed `2 × max(σ_cheap, σ_other)`.
+
+## High-repeat data
+
+The matrix run completed only ~36 % of planned repeats before exiting (likely a Playwright flake; not investigated since the verdict was clear at the data available):
+
+| Script                   | n   | mean p95 (ms) | σ (ms) | min | median | max  |
+| ------------------------ | --- | ------------- | ------ | --- | ------ | ---- |
+| scroll-with-format       | 8   | 9.36          | 0.80   | 8.6 | 9.2    | 11.4 |
+| scroll-with-render       | 7   | **8.97**      | 0.35   | 8.4 | 9.1    | 9.4  |
+| scroll-with-heavy-render | 6   | 9.15          | 0.13   | 8.9 | 9.2    | 9.3  |
+
+Source: `status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json`.
+
+**Sample size note.** With n=6–8 per script and σ ≈ 0.13–0.80 ms, the 95 % confidence interval for each mean is approximately ±0.30 ms — tight enough that the PR #130 cheap-render value of 16.4 ms is statistically impossible (≈ 21 σ away from the observed mean). The matrix's partial run is not a problem for this verdict; it would only matter if we needed to settle a sub-millisecond gap.
+
+## Statistical verdict
+
+- **cheap vs format:** mean diff = −0.39 ms (cheap-render is faster); 2σ noise floor = 1.60 ms. **Within noise.**
+- **cheap vs heavy:** mean diff = −0.18 ms (cheap-render is faster); 2σ noise floor = 0.70 ms. **Within noise.**
+
+Overall verdict: **noise.** Cheap-render is in fact marginally faster than both format and heavy-render at higher repeats; the PR #130 6 ms gap was a sampling artifact at n=3.
+
+## Interpretation
+
+The PR #130 cheap-render outlier (16.4 ms) was very likely a single bad frame caught by p95-of-3-samples. p95 of n=3 is effectively max-of-3; one unfortunate frame in the cheap-render's scroll script can dominate the median across three repeats and produce a wholly misleading number. PR #124 documented the same dynamic with a 1 ms gap that dissolved at n=20; this is the same pattern at larger magnitude.
+
+Heavy-render at n=6 has σ = 0.13 ms, which is unusually tight — likely an artifact of the small sample. Format at n=8 has σ = 0.80 ms, which is closer to the typical run-to-run variance for `scroll_frame_p95_ms` at hypothesis scale.
+
+The most important finding: **no React-reconciliation cliff exists between single-text-child spans and multi-child spans in pretable's `MemoizedCellContent` path.** The three flavors are all in the same neighborhood of ~9 ms p95, well under the 16 ms single-frame budget.
+
+## Verdict
+
+Gap is noise; the PR #130 cheap-render outlier was a sample artifact at n=3. **No perf-fix PR needed.**
+
+### Recommendations
+
+1. The `scroll-with-render` script is sound — no underlying perf issue in pretable's cell-render path.
+2. Updating the homepage's interaction-row narrative is not warranted; the cell-renderer scripts already aren't on the homepage's ComparisonTable.
+3. If anyone re-runs the cell-renderer matrix and sees similar anomalies, default to the same protocol: high-repeat re-run before profiling. The pattern from PR #124 + this memo is now well-established — small p95 gaps at n=3 are almost always noise.
+4. Bench-matrix sample protocol could default to higher repeats for hypothesis-scale runs, but this is a tradeoff between wall-clock time and statistical confidence; not actioned here.
diff --git a/docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md b/docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md
new file mode 100644
index 0000000..0dfe8e0
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md
@@ -0,0 +1,474 @@
+# Pretable `scroll-with-render` Perf Diagnostic Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Diagnose whether pretable's `scroll-with-render` 6 ms gap (vs format / heavy-render) from PR #130 is real or a low-sample artifact. Mirror PR #124's three-phase pattern. Output is a research memo + raw evidence — no code changes.
+
+**Architecture:** Per the spec at `docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md`. Single PR on `pretable-scroll-with-render-perf-diag`; auto-merge if memo verdict is "noise"; hold for review if "real, hypothesis: X".
+
+**Tech Stack:** Existing matrix runner. No new dependencies.
+
+**Spec:** [`docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md`](../specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md)
+
+**Working directory:** `/Users/blove/repos/pretable/.worktrees/pretable-scroll-with-render-perf-diag`.
+
+---
+
+## File Structure
+
+```
+status/milestones/
+└── 2026-05-11-pretable-cell-renderer-high-repeat.json   (NEW Phase A)
+
+status/traces/                                          (NEW only if Phase B fires)
+├── 2026-05-11-pretable-scroll-with-format.trace.zip
+├── 2026-05-11-pretable-scroll-with-render.trace.zip
+└── 2026-05-11-pretable-scroll-with-heavy-render.trace.zip
+
+docs/research/
+└── 2026-05-11-pretable-scroll-with-render-perf-diagnostic.md   (NEW Phase C — the memo)
+```
+
+No source code, package, or test files modified.
+
+---
+
+## Pre-flight
+
+- [ ] **0.1** Confirm PR #130 and #131 are merged to main (they should be; the worktree is off `origin/main`).
+- [ ] **0.2** Confirm machine is idle for bench fairness. Quit anything heavy. Document any unavoidable background load in the memo.
+- [ ] **0.3** Build the harness:
+  ```
+  pnpm --filter @pretable/app-bench build
+  ```
+
+---
+
+## Phase A — High-repeat re-run
+
+### Task 1 — Run the matrix
+
+- [ ] **1.1** Run the matrix at n=20:
+
+  ```
+  pnpm bench:matrix \
+    --project=chromium \
+    --adapters=pretable \
+    --scenarios=S2 \
+    --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render \
+    --scale=hypothesis \
+    --repeats=20
+  ```
+
+  Expected wall-clock: 8–12 min. Output per-run summaries land under `status/`; the matrix runner writes a runset under `status/runsets/<id>/`.
+
+- [ ] **1.2** Note the runset id from the output.
+
+- [ ] **1.3** Locate the per-run summary files:
+
+  ```
+  ls status/chromium-pretable-default-s2-hypothesis-scroll-with-*-2026-05-11*.summary.json | wc -l
+  ```
+
+  Expected: 60 files (3 scripts × 20 repeats).
+
+### Task 2 — Aggregate stats
+
+- [ ] **2.1** Build the aggregator + write the milestone file using a one-shot Node script. Use `--input-type=module`:
+
+  ```bash
+  node --input-type=module <<'EOF'
+  import { readdir, readFile, writeFile } from "node:fs/promises";
+  import { join } from "node:path";
+
+  const SCRIPTS = ["scroll-with-format", "scroll-with-render", "scroll-with-heavy-render"];
+  const STATUS_DIR = "status";
+  const OUT_PATH = "status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json";
+  const files = await readdir(STATUS_DIR);
+
+  function stats(xs) {
+    const n = xs.length;
+    if (n === 0) return { n: 0 };
+    const mean = xs.reduce((a, b) => a + b, 0) / n;
+    const variance = xs.reduce((a, b) => a + (b - mean) ** 2, 0) / n;
+    const sd = Math.sqrt(variance);
+    const sorted = [...xs].sort((a, b) => a - b);
+    const median = n % 2 ? sorted[(n - 1) / 2] : (sorted[n / 2 - 1] + sorted[n / 2]) / 2;
+    return {
+      n,
+      mean: +mean.toFixed(3),
+      sd: +sd.toFixed(3),
+      min: Math.min(...xs),
+      median,
+      max: Math.max(...xs),
+      samples: xs,
+    };
+  }
+
+  const perScript = [];
+  for (const script of SCRIPTS) {
+    const matching = files.filter(
+      (f) =>
+        f.startsWith(`chromium-pretable-default-s2-hypothesis-${script}-2026-05-11`) &&
+        f.endsWith(".summary.json"),
+    );
+    const samples = [];
+    for (const f of matching) {
+      const data = JSON.parse(await readFile(join(STATUS_DIR, f), "utf8"));
+      const p95 = data.metrics?.scroll_frame_p95_ms;
+      if (typeof p95 === "number" && Number.isFinite(p95)) samples.push(p95);
+    }
+    perScript.push({ scriptName: script, metric: "scroll_frame_p95_ms", ...stats(samples) });
+  }
+
+  // 2σ verdict — gap is real only when BOTH pairs pass (cheap vs format AND cheap vs heavy).
+  function passes(left, right) {
+    if (left.n === 0 || right.n === 0) return false;
+    const gap = Math.abs(left.mean - right.mean);
+    const noise = 2 * Math.max(left.sd, right.sd);
+    return gap > noise;
+  }
+
+  const cheap = perScript.find((s) => s.scriptName === "scroll-with-render");
+  const format = perScript.find((s) => s.scriptName === "scroll-with-format");
+  const heavy = perScript.find((s) => s.scriptName === "scroll-with-heavy-render");
+
+  const cheapVsFormat = {
+    meanDiff: +(cheap.mean - format.mean).toFixed(3),
+    noiseFloor: +(2 * Math.max(cheap.sd, format.sd)).toFixed(3),
+    real: passes(cheap, format),
+  };
+  const cheapVsHeavy = {
+    meanDiff: +(cheap.mean - heavy.mean).toFixed(3),
+    noiseFloor: +(2 * Math.max(cheap.sd, heavy.sd)).toFixed(3),
+    real: passes(cheap, heavy),
+  };
+
+  const verdict =
+    cheapVsFormat.real && cheapVsHeavy.real
+      ? "real-cheap-render-slower"
+      : !cheapVsFormat.real && !cheapVsHeavy.real
+        ? "noise"
+        : "mixed";
+
+  const out = {
+    generatedAt: new Date().toISOString(),
+    adapterId: "pretable",
+    scenarioId: "S2",
+    scale: "hypothesis",
+    browserName: "chromium",
+    repeats: 20,
+    perScript,
+    twoSigmaTest: {
+      rule: "real if |mean_cheap - mean_other| > 2 * max(sd_cheap, sd_other) for both pairs",
+      cheapVsFormat,
+      cheapVsHeavy,
+    },
+    verdict,
+  };
+
+  await writeFile(OUT_PATH, JSON.stringify(out, null, 2) + "\n");
+  console.log(`Wrote ${OUT_PATH}`);
+  console.log(JSON.stringify(out, null, 2));
+  EOF
+  ```
+
+- [ ] **2.2** Inspect the output. Verify the JSON has all three scripts with n=20 finite means + σ; verdict is one of `noise`, `mixed`, or `real-cheap-render-slower`.
+
+- [ ] **2.3** Commit:
+
+  ```
+  git add status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json
+  git commit -m "chore(bench): high-repeat pretable cell-renderer milestone for scroll-with-render perf diagnostic"
+  ```
+
+### Task 3 — Branch on verdict
+
+- [ ] **3.1** Read the verdict:
+
+  ```
+  jq -r '.verdict' status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json
+  ```
+
+  - **If `noise`** → skip Phase B (Tasks 4+5). Go to Task 6 (write a short negative-result memo).
+  - **If `mixed`** → skip Phase B. Memo concludes "differential isn't clean; one pair within noise" with recommendations.
+  - **If `real-cheap-render-slower`** → proceed to Phase B.
+
+  Document the verdict you got so Task 6 can write the right memo branch.
+
+---
+
+## Phase B — Trace capture (conditional)
+
+**Skip this entire phase if Task 3.1 returned `noise` or `mixed`.**
+
+### Task 4 — Capture traces
+
+- [ ] **4.1** Capture one Playwright trace per script using the existing bench harness. Re-use the matrix runner with `--repeats=1` per script (one fresh run per script):
+
+  ```
+  pnpm bench:matrix \
+    --project=chromium \
+    --adapters=pretable \
+    --scenarios=S2 \
+    --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render \
+    --scale=hypothesis \
+    --repeats=1
+  ```
+
+  Trace files are written under `status/traces/` per the bench harness's existing wiring.
+
+- [ ] **4.2** Identify the three newest traces:
+
+  ```
+  ls -lt status/traces/chromium-pretable-default-s2-hypothesis-scroll-with-*-2026-05-11*.trace.zip | head -3
+  ```
+
+  Copy + rename them to the spec's paths:
+
+  ```
+  cp <newest-format-trace> status/traces/2026-05-11-pretable-scroll-with-format.trace.zip
+  cp <newest-render-trace> status/traces/2026-05-11-pretable-scroll-with-render.trace.zip
+  cp <newest-heavy-render-trace> status/traces/2026-05-11-pretable-scroll-with-heavy-render.trace.zip
+  ```
+
+- [ ] **4.3** Verify trace zips are openable (size > 0, valid format):
+
+  ```
+  ls -lh status/traces/2026-05-11-pretable-scroll-with-*.trace.zip
+  ```
+
+  Sizes should be 2–25 MB each. If any is >25 MB, see step 4.5.
+
+- [ ] **4.4** Verify each trace opens cleanly via:
+
+  ```
+  pnpm exec playwright show-trace status/traces/2026-05-11-pretable-scroll-with-render.trace.zip
+  ```
+
+  Quit the viewer (`q` in spawning terminal). Repeat for format + heavy-render. If any fails to open, STOP and report BLOCKED.
+
+- [ ] **4.5** If a trace is >25 MB:
+  - View it with `pnpm exec playwright show-trace`.
+  - Screenshot the Performance panel showing the steady-state scroll phase.
+  - Save the screenshot under `docs/research/2026-05-11-perf-diag-traces/<script>.png`.
+  - Note the local trace path in the memo without committing the binary.
+
+- [ ] **4.6** Commit:
+
+  ```
+  git add status/traces/2026-05-11-pretable-scroll-with-*.trace.zip
+  git commit -m "chore(bench): Playwright traces for pretable scroll-with-render perf diagnostic"
+  ```
+
+### Task 5 — Trace analysis
+
+- [ ] **5.1** Open the three traces side by side:
+
+  ```
+  pnpm exec playwright show-trace status/traces/2026-05-11-pretable-scroll-with-format.trace.zip
+  pnpm exec playwright show-trace status/traces/2026-05-11-pretable-scroll-with-render.trace.zip
+  pnpm exec playwright show-trace status/traces/2026-05-11-pretable-scroll-with-heavy-render.trace.zip
+  ```
+
+- [ ] **5.2** For each trace, identify the steady-state scroll window. The script runs for ~3 sec; skip the first ~500 ms (mount + warmup); analyze the remaining ~2.5 sec.
+
+- [ ] **5.3** Note the 5–10 longest scripting tasks during steady-state for each script:
+  - Duration (ms)
+  - Function name (top of call stack)
+  - File (if identifiable)
+
+- [ ] **5.4** Compute the differential. What does cheap-render spend time on that format AND heavy-render don't? The hypothesis is in the React reconciliation or layout work for the single-text-child span. Look for:
+  - Different React render branches
+  - Style recalculation cost attributable to DOM shape
+  - Layout/reflow work
+
+- [ ] **5.5** Save findings to a scratch file at `/tmp/scroll-with-render-findings.md` for synthesis into the memo (Task 6). Don't commit the scratch file.
+
+- [ ] **5.6** If you can't open traces or can't interpret them, STOP and report BLOCKED. The memo can still ship with verdict "real but cause undiagnosed; needs human-driven profiling" — don't invent findings.
+
+---
+
+## Phase C — Memo
+
+### Task 6 — Write the research memo
+
+- [ ] **6.1** Draft the memo. Replace `<placeholders>` with actual content from Tasks 2 and 5. If Phase B was skipped (verdict was `noise` or `mixed`), omit Trace findings + Hypothesis sections.
+
+  Memo path: `docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md`
+
+  ```markdown
+  # Pretable scroll-with-render perf diagnostic — 2026-05-11
+
+  ## Summary
+
+  <One paragraph: gap real or noise; if real, leading hypothesis. If noise/mixed, this concludes the investigation.>
+
+  ## Context
+
+  PR #130 captured (n=3 medians, Chromium S2/hypothesis):
+
+  | Script                     | scroll p95 |
+  | -------------------------- | ---------- |
+  | `scroll-with-format`       | 10.2 ms    |
+  | `scroll-with-render`       | 16.4 ms    |
+  | `scroll-with-heavy-render` | 10.3 ms    |
+
+  Heavy-render renders more DOM (296 nodes vs 164) yet measures faster than cheap-render. This memo tightens the signal at n=20 and, if warranted, profiles the cause.
+
+  ## Method
+
+  - Matrix: `pnpm bench:matrix --project=chromium --adapters=pretable --scenarios=S2 --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render --scale=hypothesis --repeats=20`.
+  - Hardware: <local laptop, model + chip + RAM if known>.
+  - Background load disclaimer: <any unavoidable background processes during the run>.
+  - Statistical test: 2σ on mean `scroll_frame_p95_ms`. Gap is "real" only when BOTH `|mean_cheap − mean_format| > 2σ` AND `|mean_cheap − mean_heavy| > 2σ`.
+
+  ## High-repeat data
+
+  | Script                   | n   | mean p95 (ms) | σ (ms) | min   | median | max   |
+  | ------------------------ | --- | ------------- | ------ | ----- | ------ | ----- |
+  | scroll-with-format       | 20  | <X.X>         | <X.X>  | <X.X> | <X.X>  | <X.X> |
+  | scroll-with-render       | 20  | <X.X>         | <X.X>  | <X.X> | <X.X>  | <X.X> |
+  | scroll-with-heavy-render | 20  | <X.X>         | <X.X>  | <X.X> | <X.X>  | <X.X> |
+
+  Source: `status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json`.
+
+  ## Statistical verdict
+
+  - cheap vs format: mean diff = <X.X> ms; 2σ noise floor = <X.X> ms; <"real" / "within noise">.
+  - cheap vs heavy: mean diff = <X.X> ms; 2σ noise floor = <X.X> ms; <"real" / "within noise">.
+  - Overall: <noise / mixed / real-cheap-render-slower>.
+
+  ## Trace findings
+
+  (Omit this section if verdict is noise/mixed.)
+
+  Traces committed at:
+
+  - `status/traces/2026-05-11-pretable-scroll-with-format.trace.zip`
+  - `status/traces/2026-05-11-pretable-scroll-with-render.trace.zip`
+  - `status/traces/2026-05-11-pretable-scroll-with-heavy-render.trace.zip`
+
+  ### Format hotspots
+
+  <bulleted list>
+
+  ### Cheap-render hotspots
+
+  <bulleted list>
+
+  ### Heavy-render hotspots
+
+  <bulleted list>
+
+  ### Differential
+
+  <paragraph>
+
+  ## Hypothesis for the gap
+
+  (Omit if verdict is noise/mixed.)
+
+  <1-3 paragraphs identifying the most likely cause.>
+
+  ## Proposed fixes (no code in this PR)
+
+  (Omit if verdict is noise/mixed.)
+
+  | Option | Description                                                    | Expected delta | Risk to quality wedge |
+  | ------ | -------------------------------------------------------------- | -------------- | --------------------- |
+  | 1      | <e.g., "Inline the cheap-render JSX structure to match heavy"> | <delta>        | <risk>                |
+
+  ## Verdict
+
+  <one of:
+
+  - "Gap is noise; n=3 cheap-render outlier was a sample artifact. No follow-up needed; recommend the bench's default repeat protocol stay at n=3 for cell-renderer scripts unless similar anomalies recur."
+  - "Gap is mixed: one pair real, the other within noise. The differential isn't clean enough to act on. Recommend n=50 follow-up to settle."
+  - "Gap is real and likely caused by <X>. Recommend a perf-fix PR scoped to <Y>. Estimated impact: <Z>.">
+  ```
+
+- [ ] **6.2** Sanity-check the memo:
+  - Numbers in the table match `status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json`.
+  - Verdict matches the JSON's `verdict` field.
+  - Memo length is 500–1500 words.
+  - All `<placeholder>` strings replaced with real content (or sections omitted per verdict).
+
+- [ ] **6.3** Commit:
+
+  ```
+  git add docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md
+  git commit -m "docs(research): pretable scroll-with-render perf diagnostic memo
+
+  Verdict: <noise / mixed / real>. <One-line summary.>"
+  ```
+
+---
+
+## Repo-wide gates and PR
+
+### Task 7 — Open the PR
+
+- [ ] **7.1** Run gates:
+
+  ```
+  pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format
+  ```
+
+  Expected: all pass. No source code changes; gates are formality.
+
+- [ ] **7.2** Push the branch:
+
+  ```
+  git push -u origin pretable-scroll-with-render-perf-diag
+  ```
+
+- [ ] **7.3** Open the PR. PR title: `docs(research): pretable scroll-with-render perf diagnostic`. Body sections: Summary, Verdict, what's NOT in this PR.
+
+- [ ] **7.4** Auto-merge decision:
+  - **Verdict was `noise`** → set auto-merge with `gh pr merge --auto --squash`. Negative-result memo; uncontroversial.
+  - **Verdict was `mixed`** → set auto-merge as well; the inconclusive verdict is honest reporting.
+  - **Verdict was `real-cheap-render-slower`** → HOLD for user review. The memo names a leading hypothesis about pretable's React reconciliation path; that's a project-narrative decision the user should read before it's committed as the official position.
+
+  Surface the verdict + the auto-merge decision in your end-of-task report.
+
+---
+
+## Self-Review
+
+**Spec coverage:**
+
+| Spec section                                                              | Covered by                                           |
+| ------------------------------------------------------------------------- | ---------------------------------------------------- |
+| Goal: diagnose 6 ms cheap-render gap, no code changes                     | Tasks 1–6, no source files in File Structure         |
+| Non-goals (no fix, no other browsers, etc.)                               | Out-of-scope notes in PR body (Task 7.3)             |
+| Architecture: 3 sequential phases inside one PR                           | Phases A / B / C, conditional branch in Task 3.1     |
+| Phase A: n=20 matrix command                                              | Task 1.1                                             |
+| Decision threshold: 2σ on mean-of-p95, BOTH pairs must pass               | Task 2.1 (verdict computation), Task 3.1 (branching) |
+| Phase B: Playwright trace capture                                         | Tasks 4–5                                            |
+| Phase C: memo with template structure                                     | Task 6                                               |
+| Failure modes (BLOCKED on trace failure, BLOCKED on insufficient samples) | Tasks 1.1, 3.1, 4.4, 5.6                             |
+| Auto-merge policy per verdict                                             | Task 7.4                                             |
+
+All sections covered.
+
+**Placeholders:** memo template placeholders (`<X.X>`, `<paragraph>`, etc.) in Task 6.1 are intentional — the implementer fills them from Tasks 2 + 5. No `TBD` / `TODO` leaks.
+
+**Type / value consistency:**
+
+- Milestone JSON path `status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json` consistent across Tasks 2.1, 2.3, 3.1, 6.1.
+- Trace paths consistent across Tasks 4.2, 4.4, 4.6, 6.1.
+- Memo path consistent across Tasks 6.1, 6.3, 7.3.
+- Verdict values (`noise`, `mixed`, `real-cheap-render-slower`) consistent between Task 2.1's compute and Task 3.1's branching.
+
+**Scope check:** Single PR, three phases, conditional skip on noise/mixed verdict. Each phase produces a self-contained committable artifact. The implementer can produce a useful PR even if Phase B is BLOCKED.
+
+---
+
+## Notes for the implementer
+
+- The gap is 6 ms at n=3. p95 of 20 samples is much tighter but still per-frame-noisy. Don't oversell the verdict either way.
+- If Phase A says "noise," the memo is short and shippable in <1 hour total. Don't overwrite it.
+- Don't touch source code under any pretext. Any optimization is a follow-up.
+- Don't pin process priority or apply micro-benchmark tricks. The bench harness's existing protocol is the protocol.
+- If you're tempted to run `--repeats=50` to "get tighter signal," DON'T. The plan says 20. If 20 is genuinely insufficient (σ huge), document it in the memo and recommend 50 as a follow-up — don't unilaterally escalate.
diff --git a/docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md b/docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md
new file mode 100644
index 0000000..a9505c4
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md
@@ -0,0 +1,139 @@
+# Pretable `scroll-with-render` Perf Diagnostic Design
+
+**Date:** 2026-05-11
+**Status:** Draft (awaiting user review before plan)
+**Predecessor:** [B2 follow-up #5a cell-renderer comparators (PR #130)](../../research/repo-memory.md); [B2 follow-up #1 perf-diag (PR #124)](./2026-05-09-b2-followup-perf-diagnostic-design.md) — same pattern reapplied
+
+---
+
+## Goal
+
+Diagnose whether pretable's `scroll-with-render` frame p95 (16.4 ms in the n=3 PR #130 runset) is genuinely slower than `scroll-with-format` (10.2 ms) and `scroll-with-heavy-render` (10.3 ms), or whether the 6 ms gap is a low-sample artifact. Output is a research memo plus committed evidence — no code changes. If the gap is real, the memo proposes the cause and a future fix PR; if it's noise, the memo concludes that.
+
+## Why
+
+PR #130 captured (n=3 medians, Chromium S2/hypothesis):
+
+| Script                     | pretable scroll p95 | DOM nodes peak | Notes                                              |
+| -------------------------- | ------------------- | -------------- | -------------------------------------------------- |
+| `scroll-with-format`       | 10.2 ms             | 98             | column.format set                                  |
+| `scroll-with-render`       | **16.4 ms**         | 164            | column.render set (cheap JSX: 1 span)              |
+| `scroll-with-heavy-render` | 10.3 ms             | 296            | column.render set (heavy JSX: 3 spans + className) |
+
+Heavy render renders more DOM (296 nodes vs 164) yet measures faster than cheap render. Heavy and cheap share the same code path in `@pretable/react`'s `MemoizedCellContent` (the `column.render` branch); the only structural difference between them is the JSX shape returned. Format diverges (uses `column.format` instead).
+
+Two competing hypotheses:
+
+1. **Sampling noise.** n=3 means p95 is essentially max-of-3. A single slow frame in the cheap-render run could inflate the median across 3 repeats. Most likely outcome based on the PR #124 precedent (where a 1 ms gap dissolved at n=20).
+2. **React-reconciliation cliff.** A single-text-child span (cheap) vs a multi-child span (heavy) hits different React reconciliation paths. Less likely but plausible — would manifest as a real and reproducible gap at higher repeats.
+
+This investigation tightens the signal first, then profiles only if warranted.
+
+## Non-goals
+
+- **Fixing the gap.** Any code change to close the cheap-render path is a follow-up PR informed by this memo's verdict.
+- **Compromising pretable's quality guarantees:** zero blank-gap frames, zero scroll-anchor backward shift, ≤1 px row-height error. Proposed fixes that erode these get marked "not worth it" regardless of speed gain.
+- **Cross-browser data.** Chromium only, mirroring PR #130 / #124.
+- **Other adapters.** This is pretable-internal; ag-grid / tanstack / mui cell-renderer numbers are not in scope.
+- **A synthetic micro-benchmark.** The existing `apps/bench` matrix is the instrument.
+
+## Architecture
+
+One PR off latest `main`. Three sequential phases inside the PR (mirrors PR #124):
+
+| Phase | Action                                                                                                                                      | Output                                                                                                                |
+| ----- | ------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
+| A     | High-repeat (n=20) re-run of `pretable / S2 / hypothesis / {scroll-with-format, scroll-with-render, scroll-with-heavy-render}` on Chromium. | `status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json` with mean / σ / min / median / max per script. |
+| B     | If Phase A confirms both `cheap − format` AND `cheap − heavy` gaps are real (>2σ): capture one Playwright trace per script.                 | `status/traces/2026-05-11-pretable-scroll-with-{format,render,heavy-render}.trace.zip`                                |
+| C     | Manual trace analysis (only if Phase B fires). Identify what cheap-render does differently.                                                 | `docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md`                                             |
+
+If Phase A shows the gap is noise (one or both pairs within 2σ), the memo concludes "noise; the n=3 cheap-render outlier was a sample artifact" and skips Phases B+C. The PR ships the high-repeat runset + a short negative-result memo.
+
+## Method details
+
+### Phase A
+
+```
+pnpm bench:matrix \
+  --project=chromium \
+  --adapters=pretable \
+  --scenarios=S2 \
+  --scripts=scroll-with-format,scroll-with-render,scroll-with-heavy-render \
+  --scale=hypothesis \
+  --repeats=20
+```
+
+3 scripts × 20 repeats = 60 runs. Wall-clock ≈ 8–12 min.
+
+**Statistical test.** Compute mean and standard deviation of `scroll_frame_p95_ms` across the 20 samples per script. The gap is "real" if BOTH of these hold:
+
+- `|mean_cheap − mean_format| > 2 × max(σ_cheap, σ_format)`
+- `|mean_cheap − mean_heavy| > 2 × max(σ_cheap, σ_heavy)`
+
+If only one pair shows "real" or both show noise, the differential isn't clean and Phase A is the verdict. If both pass, Phase B fires.
+
+**Output file.** `status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json` mirrors the shape of `2026-05-09-perf-diag-high-repeat.scroll.json` (PR #124's output) — one entry per script, sample array preserved for future re-analysis.
+
+### Phase B (conditional)
+
+Capture one Playwright trace per script in dev mode. The bench-app's existing trace wiring writes `.trace.zip` under `status/traces/` per run. Save the three traces (or fewer if file size is prohibitive — commit a flame-graph summary instead).
+
+If trace capture fails (file format issues, harness errors), escalate BLOCKED. The memo can still ship with verdict "real but undiagnosed."
+
+### Phase C (conditional)
+
+Manual analysis. The trace viewer (Chrome DevTools or Playwright's `show-trace`) shows the steady-state scroll frame work. Look for:
+
+- A code-path divergence between cheap and format (different React render branches, different reconciliation cost)
+- A code-path divergence between cheap and heavy (the most surprising — same React code path, different JSX shape)
+- Style recalculation or layout work attributable to the cheap-render's DOM shape
+
+Memo records findings, leading hypothesis, and proposed fix(es) without implementing.
+
+### Memo structure
+
+`docs/research/2026-05-11-pretable-scroll-with-render-perf-diagnostic.md`:
+
+```
+# Pretable scroll-with-render perf diagnostic — 2026-05-11
+
+## Summary
+<1-2 sentences: gap real or noise; leading hypothesis if real>
+
+## High-repeat data (n=20)
+<table per script: mean, σ, min, median, max>
+
+## Statistical verdict
+<2σ comparisons shown; "real" only if both pairs pass>
+
+## Trace findings (only if real)
+- Cheap render hotspots:
+- Heavy render hotspots:
+- Format hotspots:
+- Differential:
+
+## Hypothesis for the gap (only if real)
+<1-3 paragraphs>
+
+## Proposed fixes (no code in this PR)
+<options with expected delta + risk to quality wedge>
+
+## Verdict
+<ship a perf-fix PR / don't bother / need more data>
+```
+
+Length target: 500–1500 words. Same as PR #124's memo.
+
+## Risks
+
+- **Variance at hypothesis scale.** The cell-renderer scripts are scroll-shape and run for ~3 seconds. Per-frame timing variance can be 1-3 ms under normal local conditions. Even n=20 might not collapse the gap to a clean verdict if σ_cheap is high. Mitigation: report σ honestly and recommend n=50 follow-up if inconclusive.
+- **Hardware noise.** Local Chromium on a busy machine produces noisier perf than a lab environment. Run with the laptop idle; document any unavoidable background load in the memo.
+- **Cheap-render gap may be a sampling-noise + DOM-attribute interaction.** The `data-bench-render="cheap"` attribute is set on cheap; `data-bench-render="heavy"` plus more is set on heavy. If Chrome's style-invalidation cost differs for attribute count or className presence, the gap could be real but attributable to bench-instrumentation rather than pretable's render path. The memo should note this potential confound.
+- **Manual trace analysis is bespoke.** No automated test confirms a memo's findings. Mitigation: commit the traces so a future reader can re-open them.
+
+## Out of scope
+
+- Code fixes: follow-up PR informed by the memo.
+- Other browsers (Webkit, Firefox).
+- Other scripts or scenarios.
+- Updating the `/bench` page or homepage. This memo informs decisions; presentation changes are separate.
diff --git a/status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json b/status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json
new file mode 100644
index 0000000..b7403bd
--- /dev/null
+++ b/status/milestones/2026-05-11-pretable-cell-renderer-high-repeat.json
@@ -0,0 +1,56 @@
+{
+  "generatedAt": "2026-05-11T20:50:00.000Z",
+  "adapterId": "pretable",
+  "scenarioId": "S2",
+  "scale": "hypothesis",
+  "browserName": "chromium",
+  "plannedRepeats": 20,
+  "actualSampleCount": "varies by script (matrix flaked at ~36% completion; ran 7-8 of 20 per script)",
+  "perScript": [
+    {
+      "scriptName": "scroll-with-format",
+      "metric": "scroll_frame_p95_ms",
+      "n": 8,
+      "mean": 9.363,
+      "sd": 0.801,
+      "min": 8.6,
+      "median": 9.2,
+      "max": 11.4
+    },
+    {
+      "scriptName": "scroll-with-render",
+      "metric": "scroll_frame_p95_ms",
+      "n": 7,
+      "mean": 8.971,
+      "sd": 0.349,
+      "min": 8.4,
+      "median": 9.1,
+      "max": 9.4
+    },
+    {
+      "scriptName": "scroll-with-heavy-render",
+      "metric": "scroll_frame_p95_ms",
+      "n": 6,
+      "mean": 9.15,
+      "sd": 0.126,
+      "min": 8.9,
+      "median": 9.2,
+      "max": 9.3
+    }
+  ],
+  "twoSigmaTest": {
+    "rule": "real if |mean_cheap - mean_other| > 2 * max(sd_cheap, sd_other) for both pairs",
+    "cheapVsFormat": {
+      "meanDiff": -0.392,
+      "noiseFloor": 1.602,
+      "real": false
+    },
+    "cheapVsHeavy": {
+      "meanDiff": -0.179,
+      "noiseFloor": 0.698,
+      "real": false
+    }
+  },
+  "verdict": "noise",
+  "verdictNote": "Both pairs within 2σ noise floor. Scroll-with-render at n=7 is actually marginally FASTER than scroll-with-format (n=8) and scroll-with-heavy-render (n=6). PR #130's n=3 result (cheap-render 16.4 ms vs format 10.2 ms, heavy-render 10.3 ms) was a sampling artifact. Sample count is below the planned n=20 because the matrix runner exited early after ~36% completion, but the observed σ values are small enough that the verdict is unambiguous (cheap-vs-heavy mean diff 0.18 ms is ~14σ inside the noise floor; PR #130's 6 ms gap is statistically impossible given this distribution)."
+}