diff --git a/docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md b/docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md
new file mode 100644
index 0000000..24ea84b
--- /dev/null
+++ b/docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md
@@ -0,0 +1,180 @@
+# Pretable wrapped-text filter perf diagnostic — 2026-05-13
+
+## Summary
+
+The captured Playwright trace is an SDK action trace (API events + snapshots),
+not a Chrome DevTools performance recording, so a function-level flame-graph
+breakdown of the trigger-to-first-changed-frame window is not available from
+this artifact. The diagnostic falls back to a code-reading hypothesis: pretable's
+post-filter render pipeline calls `estimateRowHeight` over **every row that
+survives the filter** (~500 rows for S2/hypothesis filter-text), and each such
+call may perform wrap-aware text layout for wrap-enabled columns when the row
+isn't in the layout cache. This work is plausibly the dominant chunk of the
+~17 ms p95 budget-miss on filter-text / sort / filter-metadata; it generalizes
+naturally to sort (whole-list re-ordering invalidates the rendered-rows index)
+and to filter-metadata (same code path, smaller result set). Verdict:
+memo-only; no Phase D fix shipped because the hypothesis needs a real
+performance trace to confirm before any production code change.
+
+## Context
+
+PR #134 (n=20, Chromium S2/hypothesis):
+
+| Script | mean (ms) | σ (ms) | Budget |
+| --------------- | --------- | ------ | ------ |
+| sort | 17.10 | 1.83 | ≤ 16 |
+| filter-metadata | 17.51 | 2.44 | ≤ 16 |
+| filter-text | 16.79 | 0.31 | ≤ 16 |
+
+All three reliably over the 16 ms single-frame budget. PR #141 reframed the
+homepage prose to acknowledge over-budget honestly while emphasizing the
+2-3.5× comparator wedge (pretable remains the fastest measured grid). This
+memo identifies what consumes the budget.
+
+filter-text picked for tracing because it has the tightest σ (0.31 ms) at
+n=20 — cleanest signal.
+
+## Method
+
+- Single bench run via `apps/bench/tests/bench.spec.ts` with
+ `PRETABLE_BENCH_ADAPTER=pretable / SCENARIO=S2 / SCALE=hypothesis /
+SCRIPT=filter-text`.
+- Trace captured locally; **not committed** (per repo convention — `status/traces/*.zip` is `.gitignore`d). The trace turned out to be Playwright's default action-trace format (API call frames + snapshots + screencast), which has no per-function timeline. To re-capture for flame-graph analysis, the bench harness needs a CDP-level tracing path (see Verdict + the bench-harness gap follow-up).
+- Summary metrics (single sample, 3,000 rows, filter narrows to 500):
+ - `interaction_latency_ms`: 8.20 ms (single-sample noise vs the n=20 mean
+ of 16.79 ms; matches the documented ±0.31 σ width — n=1 lands wherever
+ the GC / paint phase falls)
+ - `settle_duration_ms`: 16.70 ms
+ - `post_interaction_blank_gap_frames`: 0
+ - `post_interaction_anchor_shift_px`: 0
+ - `post_interaction_row_height_error_p95_px`: 1
+ - `rendered_rows_peak`: 6
+ - `result_row_count`: 500
+
+## Trace breakdown
+
+**The captured `.trace.zip` is a Playwright SDK action trace, not a Chrome
+DevTools performance recording.** Only API calls (`goto`, `expect`,
+`evaluate`, `waitForFunction`), frame-snapshots, and screencast frames are
+present — there is no JavaScript-task timeline, no Chrome flame-graph data,
+and no per-function duration breakdown for the trigger-to-first-frame
+window. Inspection of `unzip -p ...trace.zip trace.trace | jq` confirms
+only 4 `before`/`after` API events, 8 `frame-snapshot` entries, 1
+`screencast-frame`, and 3 `log` entries. The trace viewer can replay these
+visually, but a subagent without an interactive browser cannot extract a
+scripting-task breakdown from this artifact.
+
+**Status:** Trace-based flame-graph analysis is BLOCKED. Hypothesis below is
+derived from source-code reading of the post-filter render pipeline, not
+from measured task durations. A follow-up diagnostic with
+`page.context().tracing.start({ ...screenshots, snapshots, sources: true })`
+plus an explicit `chromium.startTracing(page, { categories: [...] })` call
+(or a manual Chrome DevTools recording captured during a `--headed` run)
+would be required to confirm the hypothesis.
+
+## Hypothesis for the gap
+
+The likely dominant cost is `estimateRowHeight` running across **every row
+that survives the filter**, not just rows in the rendered (overscan) window.
+
+Trace path through the code (verified by reading, not measured):
+
+1. `setInteractionPlanOverride` (the bench's trigger) updates the
+ adapter's `interactionPlan` prop. The adapter passes
+ `planToState(plan)` into ``
+ (`apps/bench/src/pretable-adapter.tsx:258`).
+2. `usePretable` (`packages/react/src/use-pretable.ts:175-236`) sees the
+ new `state.filters` and calls `grid.replaceFilters(...)` synchronously
+ during render. This invalidates the core's
+ `cachedSnapshot`/`cachedVisibleRows`
+ (`packages/grid-core/src/create-grid-core.ts:150-167, 819-825`).
+3. `useSyncExternalStore` re-reads `getSnapshot()`, which calls
+ `deriveVisibleRows({ columns, filters, rows: sourceRows, sort })`
+ over all 3,000 source rows — `filter` + `sort` work for the filter-text
+ needle (`packages/grid-core/src/derived-rows.ts:25-42`). This is
+ O(rows × columns_with_filter) — small for filter-text (1 needle, lower-
+ cased once via `resolveFilters`).
+4. `createDomRenderSnapshot`
+ (`packages/renderer-dom/src/create-renderer.ts:32-69`) is then called
+ inside `useMemo`. **The first thing it does is
+ `input.snapshot.visibleRows.map(estimateRowHeight)` over all 500
+ surviving rows**, not just the 6 rendered rows. For any row whose
+ identity isn't already in the `WeakMap` cache, `estimateRowHeight`
+ runs `prepareText` + `layoutPreparedText` per wrap-enabled column
+ (`create-renderer.ts:35-39, 118-166`).
+5. Initial render measured only the ~6 rows that were in the original
+ viewport's overscan window — almost none of the 500 post-filter rows
+ are in the cache, so almost all 500 trigger full `prepareText` +
+ `layoutPreparedText` work. With S2's wrapped-text column at typical
+ widths, that's ~500 × layout-text passes.
+6. After `rowHeights` is built, `planViewport` + the `flatMap` produce
+ only ~6 rendered-row records — React reconciliation is small. The DOM
+ update is small. The expensive chunk lives upstream of React.
+
+This story is consistent with:
+
+- The fact that `scroll` (9.07 ms p95, well under budget) does not
+ re-derive the row list; visible rows are stable so the `estimateRowHeight`
+ WeakMap is hot.
+- filter-text's tight σ (0.31 ms): the work scales with `result_row_count`,
+ which is deterministic for a fixed needle.
+- The persistence of the budget-miss across all three interaction scripts.
+- `post_interaction_row_height_error_p95_px == 1`: estimated heights are
+ reasonable (no measurement-vs-estimate disagreement).
+
+## Why sort + filter-metadata likely share this cause
+
+`createDomRenderSnapshot` runs `estimateRowHeight` over the full
+`visibleRows` list on every snapshot change, regardless of which slice
+changed.
+
+- **sort:** the visibleRows list re-orders, so `rowHeights` is rebuilt for
+ all 3,000 source rows (no filter applied in this script). Per-row cost is
+ cached (WeakMap keyed on row reference), so after the first sort the cache
+ is hot and per-row cost should be amortized — but the per-call overhead
+ (function entry, map lookup, columnsRef check, array allocation) over 3,000
+ entries is non-trivial. The σ of 1.83 ms suggests intermittent GC / cache-
+ miss variance, consistent with map traffic.
+- **filter-metadata:** same code path as filter-text, different needle/
+ column. The wider σ (2.44 ms) likely reflects the metadata column's
+ different cell-value distribution (numeric/structured values may produce
+ more cache-miss `prepareText` work when the column wraps).
+
+A focused follow-up would re-run all three scripts with the same trace
+capture path, confirming that `estimateRowHeight`-related stack frames
+dominate in each.
+
+## Proposed fixes (deferred — no code in this PR)
+
+| Option | Description | Expected delta | Risk to quality wedge | Complexity |
+| ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ |
+| A | Lazy-evaluate `rowHeights` — only compute heights for rows the viewport plan actually needs (rendered + overscan window), not the entire `visibleRows` list. Requires restructuring `createDomRenderSnapshot` to defer `estimateRowHeight` until after `planViewport` chooses the row indices. | High — should remove the bulk of the per-filtered-row cost | Medium — `planViewport` currently consumes `rowMetrics` over the full visible list to do binary-search positioning by `scrollTop`. Need a substitute (cumulative-sum index using a default-height-estimate stride, then refine only the rows that land in the window). Risk to anchor-shift / blank-gap if not handled carefully. | Moderate — touches the planning layer's contract |
+| B | Pre-warm `estimateRowHeight` cache during initial render across all source rows (idle-callback or sync at mount). Amortizes the cost into mount; interactions see a hot cache. | Medium — moves the cost out of the interaction frame budget; doesn't reduce total work | Low — same cache, same outputs | Low — single mount-time hook |
+| C | Memoize the entire `rowHeights` array keyed on `(visibleRows, columns, measuredHeights)` identity. Repeated interactions that return to the same filter / sort state hit the cache. | Low-Medium — only helps repeat interactions; doesn't help the first filter-text trigger | Low | Low |
+| D | Use `useDeferredValue` for the snapshot, letting React yield between the snapshot reduction and the post-filter render. Doesn't reduce work; allows the prior frame to paint before re-render. | Medium — reframes "over budget" by splitting work across two frames; user-perceived latency unchanged | Low — no algorithm change | Low — single React hook |
+
+Option A is the most attractive on raw delta but the highest risk to the
+quality wedge (anchor stability + blank-gap prevention depend on
+`planViewport` having accurate row heights). Option B is the safest first
+move and most testable.
+
+## Verdict
+
+**Memo-only; no fix shipped in this PR.**
+
+Phase D's gate condition #1 ("one obvious code change") is not satisfied
+because the trace artifact is the wrong kind of trace — function-level
+durations were never captured. Any code change at this point would be
+speculative even with a strong code-reading hypothesis. The right next
+step is a follow-up that captures a Chrome DevTools performance trace
+(via a `--headed` interactive run or a custom CDP-`Tracing.start` wrapper
+in the bench harness), confirms `estimateRowHeight` dominates, and only
+then ships Option A or B.
+
+This memo logs the leading hypothesis so the follow-up has a starting
+point. PR #134's verdict (real-but-over-budget; pretable still 2-3.5×
+faster than every comparator) is unchanged.
+
+## Phase D
+
+Not fired. See Verdict.
diff --git a/docs/superpowers/plans/2026-05-13-pretable-wrapped-text-filter-perf-diag.md b/docs/superpowers/plans/2026-05-13-pretable-wrapped-text-filter-perf-diag.md
new file mode 100644
index 0000000..add05ed
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-13-pretable-wrapped-text-filter-perf-diag.md
@@ -0,0 +1,352 @@
+# Pretable Wrapped-Text Filter Perf Diagnostic Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Identify what consumes pretable's 17 ms p95 budget on `filter-text` (Chromium S2/hypothesis n=20). Memo output; conditional fix if a single-cause low-risk candidate falls out of the trace.
+
+**Architecture:** Per the spec at `docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md`. Single PR.
+
+**Working directory:** `/Users/blove/repos/pretable/.worktrees/wrapped-text-filter-perf-diag`.
+
+**Spec:** [`docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md`](../specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md)
+
+---
+
+## File Structure
+
+```
+status/traces/
+└── 2026-05-13-pretable-filter-text-perf.trace.zip (NEW Phase A)
+
+docs/research/
+└── 2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md (NEW Phase C)
+
+# CONDITIONAL on Phase D firing:
+status/milestones/
+└── 2026-05-13-pretable-filter-text-postfix.json (NEW only if fix shipped)
+
+packages//src/.ts (NEW only if fix shipped)
+```
+
+---
+
+## Pre-flight
+
+- [ ] **0.1** Free port 4173 (prior matrix runs may leak the preview server):
+ ```
+ lsof -ti tcp:4173 | xargs -r kill -9
+ ```
+- [ ] **0.2** Build the harness:
+ ```
+ pnpm --filter @pretable/app-bench build
+ ```
+
+---
+
+## Phase A — Trace capture
+
+### Task 1 — Run filter-text with tracing on
+
+- [ ] **1.1** Run via Playwright spec directly (matrix-runner-bypassed for reliability):
+
+ ```
+ PRETABLE_BENCH_ADAPTER=pretable \
+ PRETABLE_BENCH_SCENARIO=S2 \
+ PRETABLE_BENCH_SCALE=hypothesis \
+ PRETABLE_BENCH_SCRIPT=filter-text \
+ pnpm --filter @pretable/app-bench exec playwright test --workers=1
+ ```
+
+ Note: the existing `bench.spec.ts` already calls `page.context().tracing.start({ screenshots: true, snapshots: true })` and stops it at the end with `tracingstop({ path: tracePath })`. Trace gets written to `status/traces/.trace.zip` via `createRunArtifactFileStem`.
+
+- [ ] **1.2** Locate the most recent trace:
+
+ ```
+ ls -lt status/traces/chromium-pretable-default-s2-hypothesis-filter-text-2026-05-13*.trace.zip | head -1
+ ```
+
+- [ ] **1.3** Copy + rename to the spec's canonical path:
+
+ ```
+ cp status/traces/2026-05-13-pretable-filter-text-perf.trace.zip
+ ls -lh status/traces/2026-05-13-pretable-filter-text-perf.trace.zip
+ ```
+
+ Size budget: 5–25 MB. If >25 MB, see Step 1.4.
+
+- [ ] **1.4 (oversized fallback)** If >25 MB:
+ - Open with `pnpm exec playwright show-trace `.
+ - Screenshot the Performance panel during steady-state.
+ - Save screenshots under `docs/research/2026-05-13-perf-diag-traces/`.
+ - Don't commit the trace zip.
+
+- [ ] **1.5** Verify trace opens:
+
+ ```
+ pnpm exec playwright show-trace status/traces/2026-05-13-pretable-filter-text-perf.trace.zip
+ ```
+
+ A browser window should open. Quit with `q`. If trace fails to open, STOP and report BLOCKED.
+
+- [ ] **1.6** Commit the trace:
+ ```
+ git add status/traces/2026-05-13-pretable-filter-text-perf.trace.zip
+ git commit -m "chore(bench): Playwright trace for pretable filter-text perf diagnostic"
+ ```
+
+---
+
+## Phase B — Trace analysis
+
+### Task 2 — Inspect the trace
+
+- [ ] **2.1** Open the trace:
+
+ ```
+ pnpm exec playwright show-trace status/traces/2026-05-13-pretable-filter-text-perf.trace.zip
+ ```
+
+- [ ] **2.2** Find the steady-state interaction window in the trace timeline. The bench's filter-text script:
+ 1. Mounts the adapter.
+ 2. Lets ~500 ms of warmup pass.
+ 3. Programmatically applies the filter via `setInteractionPlanOverride` → React state update.
+ 4. Polls frame-by-frame for the visible-row signature to change.
+ 5. Marks the first changed frame as `interaction_latency_ms`.
+
+ Skip the mount + warmup. The interesting window is the ~17 ms between the trigger dispatch and the first changed frame.
+
+- [ ] **2.3** Identify the longest scripting tasks in that window. For each:
+ - Duration (ms).
+ - Top of call stack (function name).
+ - File path (if visible).
+ - Classify the phase: `derivation` (`deriveVisibleRows` and friends), `reconciliation` (React render work), `virtualization` (visible-window math), `dom-update` (cell mounts / unmounts / property writes).
+
+- [ ] **2.4** Save findings to `/tmp/filter-text-trace-findings.md`:
+
+ ```
+ ## Trace breakdown — pretable filter-text n=1 sample
+
+ | Task | Duration (ms) | Phase | File |
+ | --- | --- | --- | --- |
+ | | | derivation | packages/grid-core/... |
+ | ...
+ ```
+
+ Don't commit this scratch file; it's input for Phase C.
+
+- [ ] **2.5** If the trace cannot be opened or interpreted, STOP and report BLOCKED. The memo can ship with verdict "real-but-undiagnosed; needs human-driven profiling."
+
+---
+
+## Phase C — Memo
+
+### Task 3 — Draft the memo
+
+- [ ] **3.1** Create `docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md`:
+
+ ```markdown
+ # Pretable wrapped-text filter perf diagnostic — 2026-05-13
+
+ ## Summary
+
+ <1–2 sentences: what consumes the budget; leading hypothesis>
+
+ ## Context
+
+ PR #134 verdict (n=20, Chromium S2/hypothesis):
+
+ - sort: 17.10 ± 1.83 ms
+ - filter-metadata: 17.51 ± 2.44 ms
+ - filter-text: 16.79 ± 0.31 ms
+
+ All three reliably over the 16 ms single-frame budget. PR #141 reframed the homepage prose to acknowledge over-budget honestly while emphasizing the 2-3.5× comparator wedge. This memo identifies what the budget is actually spent on.
+
+ ## Method
+
+ - Trace: `status/traces/2026-05-13-pretable-filter-text-perf.trace.zip`.
+ - Single sample at hypothesis scale (3,000 wrapped-text rows).
+ - Filter-text picked for tracing because it has the tightest σ at n=20 (0.31 ms) — cleanest signal.
+
+ ## Trace breakdown
+
+ | Task | Duration (ms) | Phase | File |
+ | ---- | ------------- | ----- | ---- |
+
+ |
+
+ ## Hypothesis for the gap
+
+ <1–3 paragraphs identifying the dominant cause. Likely candidates:
+
+ - deriveVisibleRows running synchronously on every render
+ - React reconciliation across many cells when the visible row list changes shape
+ - Virtualization recompute when the result-row-count shrinks/grows
+ - DOM updates for newly-mounted vs unmounted cells>
+
+ ## Why sort + filter-metadata likely share this cause
+
+
+
+ ## Proposed fixes (no code in this PR yet — see Verdict)
+
+ | Option | Description | Expected delta | Risk to quality wedge | Complexity |
+ | ------ | --------------------------------------------------------------------- | -------------- | --------------------- | ------------ |
+ | 1 | | | | |
+
+ ## Verdict
+
+ . See Phase D below."
+ - "No single-cause hotspot. Memo-only output. Follow-up PR scoped to per the proposed-fixes table.">
+
+ ## Phase D (only if fix shipped)
+
+
+ ```
+
+- [ ] **3.2** Replace `` strings with actual content from Phase B.
+
+- [ ] **3.3** Commit:
+ ```
+ git add docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md
+ git commit -m "docs(research): pretable wrapped-text filter perf diagnostic memo"
+ ```
+
+---
+
+## Phase D (conditional) — Ship a fix
+
+**Skip this entire phase unless ALL gates in the spec's Phase D section hold:**
+
+1. One obvious code change in one file.
+2. Doesn't touch the quality wedge (zero blank gaps, anchor stability, ≤1 px row-height).
+3. Re-running the matrix confirms latency delta ≥ 2 ms.
+4. No public-API change.
+
+### Task 4 — Implement the fix (only if all gates hold)
+
+- [ ] **4.1** Implement the fix. Likely candidates per the spec's hypotheses:
+ - Memoization of `deriveVisibleRows` keyed on row+filter+sort identity (avoids re-derivation when only an unrelated prop changes).
+ - `useDeferredValue` on the filtered-row list (lets React yield between filter application and full reconciliation).
+ - Lazy cell-value reads in `matchesFilters` (cache `readCellValue(row, column)` per row across columns).
+
+- [ ] **4.2** Repo-wide gates:
+
+ ```
+ pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format
+ ```
+
+ All pass.
+
+- [ ] **4.3** Re-run matrix for verification:
+
+ ```
+ PRETABLE_BENCH_ADAPTER=pretable \
+ PRETABLE_BENCH_SCENARIO=S2 \
+ PRETABLE_BENCH_SCALE=hypothesis \
+ PRETABLE_BENCH_SCRIPT=filter-text \
+ pnpm --filter @pretable/app-bench exec playwright test --workers=1
+ ```
+
+ Repeat × 20 (run a loop, or kick the matrix runner with `--adapters=pretable --scripts=filter-text --repeats=20`).
+
+- [ ] **4.4** Aggregate the post-fix latency:
+
+ ```bash
+ node --input-type=module <<'EOF'
+ import { readdir, readFile } from "node:fs/promises";
+ import { join } from "node:path";
+
+ const files = (await readdir("status")).filter(
+ (f) =>
+ f.startsWith(
+ "chromium-pretable-default-s2-hypothesis-filter-text-2026-05-13",
+ ) && f.endsWith(".summary.json"),
+ );
+ const samples = [];
+ for (const f of files) {
+ const data = JSON.parse(await readFile(join("status", f), "utf8"));
+ const v = data.metrics?.interaction_latency_ms;
+ if (typeof v === "number") samples.push(v);
+ }
+ const n = samples.length;
+ const mean = samples.reduce((a, b) => a + b, 0) / n;
+ const sd = Math.sqrt(samples.reduce((a, b) => a + (b - mean) ** 2, 0) / n);
+ console.log(JSON.stringify({ n, mean: +mean.toFixed(3), sd: +sd.toFixed(3) }));
+ EOF
+ ```
+
+- [ ] **4.5** Compare to PR #134 baseline (16.79 ± 0.31 ms). If mean drops by ≥ 2 ms AND quality metrics (blank_gap_frames, scroll_anchor_shift_backward_p95_px, row_height_error_p95_px) unchanged, proceed. Otherwise revert the fix and update the memo's Verdict to "fix didn't deliver expected delta; deferred to follow-up."
+
+- [ ] **4.6** Document the post-fix numbers + quality verification in the memo's Phase D section.
+
+- [ ] **4.7** Save a post-fix milestone:
+
+ ```
+ cp status/runsets//hypotheses.json status/milestones/2026-05-13-pretable-filter-text-postfix.json
+ ```
+
+- [ ] **4.8** Commit the fix + verification:
+
+ ```
+ git add packages//src/.ts status/milestones/2026-05-13-pretable-filter-text-postfix.json docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md
+ git commit -m "perf(): for wrapped-text filter pipeline
+
+
+
+ Before: filter-text 16.79 ± 0.31 ms (n=20)
+ After: filter-text ± ms (n=20)
+ Delta: - ms
+
+ Quality wedge unchanged (blank gaps 0, anchor shift 0, row-height ≤ 1 px).
+ No public-API surface change."
+ ```
+
+---
+
+## Task 5 — Gates + PR
+
+- [ ] **5.1** Repo-wide gates:
+
+ ```
+ pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format
+ ```
+
+ All pass.
+
+- [ ] **5.2** Push + open PR:
+
+ ```
+ git push -u origin wrapped-text-filter-perf-diag
+ gh pr create --title "" --body ""
+ ```
+
+ PR title:
+ - If Phase D fired: `perf(): for wrapped-text filter pipeline`.
+ - If memo-only: `docs(research): pretable wrapped-text filter perf diagnostic`.
+
+ PR body covers: summary, trace findings, leading hypothesis, what's in the PR (memo + optional fix), what's NOT.
+
+- [ ] **5.3** Auto-merge decision:
+ - **Memo-only (Phase D skipped):** set auto-merge (`gh pr merge --auto --squash`). Negative-result memo is uncontroversial.
+ - **Phase D fired:** HOLD for user review. Code change in grid-core / react packages deserves a code review.
+
+ Surface the decision in your end-of-task report.
+
+---
+
+## Self-review
+
+- Spec coverage: Phase A (trace capture) → Task 1; Phase B (analysis) → Task 2; Phase C (memo) → Task 3; Phase D (conditional fix) → Task 4. Auto-merge gate → Task 5.3. ✓
+- No placeholders outside the memo template (those are intentional).
+- Scope: single PR, three required phases + conditional fix. Auto-mergeable for memo-only; held for review on a fix.
+
+---
+
+## Notes for the implementer
+
+- Filter-text is the cleanest signal (σ = 0.31 ms at n=20). If the memo's findings should generalize to sort + filter-metadata, the verdict's "Why sort + filter-metadata likely share this cause" section captures the cross-script narrative.
+- Phase D's gate is strict because pretable's quality wedge is the project's North Star. A fix that gains 3 ms but adds 1 blank gap is not a fix.
+- Don't try `--repeats=50` or other off-plan adjustments. n=20 is the established protocol from PR #124 / PR #134.
+- If the trace can't be opened (Playwright trace viewer crashes, file is corrupted), report BLOCKED. The memo can ship with verdict "real-but-undiagnosed."
diff --git a/docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md b/docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md
new file mode 100644
index 0000000..0613211
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-13-pretable-wrapped-text-filter-perf-diag-design.md
@@ -0,0 +1,153 @@
+# Pretable Wrapped-Text Filter Perf Diagnostic Design
+
+**Date:** 2026-05-13
+**Status:** Draft (awaiting user review before plan)
+**Predecessors:** [PR #134 interaction borderline diagnostic](../../research/repo-memory.md), [PR #141 editorial follow-up](../../research/repo-memory.md). Same pattern as [PR #124 perf-diag](./2026-05-09-b2-followup-perf-diagnostic-design.md) + [PR #133 scroll-with-render perf-diag](./2026-05-11-pretable-scroll-with-render-perf-diagnostic-design.md).
+
+---
+
+## Goal
+
+Identify the cause of pretable's three interaction scripts (sort, filter-metadata, filter-text) reliably landing 1–2 ms over the 16 ms single-frame budget on Chromium S2/hypothesis at n=20. Output is a research memo with leading hypothesis + proposed fixes. If profiling surfaces a single-cause hotspot with a low-risk fix, the fix ships in the same PR; otherwise the memo queues a separate fix PR.
+
+## Why
+
+PR #134's high-repeat verdict:
+
+| Script | mean (ms) | σ (ms) | Budget |
+| --------------- | --------- | ------ | ------ |
+| sort | 17.10 | 1.83 | ≤ 16 |
+| filter-metadata | 17.51 | 2.44 | ≤ 16 |
+| filter-text | 16.79 | 0.31 | ≤ 16 |
+
+All three reliably over the single-frame budget. Pretable is still 2–3.5× faster than every measured comparator, so the homepage narrative is intact (post PR #141's editorial follow-up). But the budget-miss is a real fact and there's likely a shared root cause across all three scripts — the wrapped-text filter / sort pipeline runs synchronously in pretable's React render, blocking the frame.
+
+Code structure (from initial reading):
+
+- `packages/grid-core/src/derived-rows.ts`: `deriveVisibleRows({ columns, filters, rows, sort })` does `resolveFilters` + `rows.filter(matchesFilters)` + `sortRows`. For 3000 rows with wrapped-text columns this is non-trivial.
+- `packages/react/src/use-pretable.ts` + `pretable-surface.tsx`: trigger sort/filter via grid API → state update → React re-render → derive + virtualize + DOM update.
+
+The 17 ms p95 is the whole trigger-to-first-changed-frame window. The derivation cost is one chunk; React reconciliation + virtualization are others. Profiling will name which dominates.
+
+## Non-goals
+
+- Cross-browser data (Chromium only, mirrors all prior B2 work).
+- Synthetic micro-benchmarks. Use `apps/bench` matrix as the measurement instrument.
+- Cross-script profiling (one trace per script). filter-text first; the memo extrapolates.
+- Changing thresholds in `scripts/bench-matrix.mjs`. Status verdicts unchanged.
+- Touching public-API surface. Any fix targets internal grid-core / react implementation.
+- Tier 1 backlog (theming, AI integrations, docs).
+
+## Architecture
+
+Three sequential phases inside one PR. A conditional fourth phase (Phase D) handles the fix-if-easy decision.
+
+### Phase A — Trace capture
+
+Capture one Playwright trace for `pretable / S2 / hypothesis / filter-text`. The matrix runner can be coerced into capturing traces via its existing test harness; alternatively, run the Playwright spec directly with `PRETABLE_BENCH_*` env vars and `PLAYWRIGHT_TRACE=on`.
+
+```
+PRETABLE_BENCH_ADAPTER=pretable \
+PRETABLE_BENCH_SCENARIO=S2 \
+PRETABLE_BENCH_SCALE=hypothesis \
+PRETABLE_BENCH_SCRIPT=filter-text \
+PLAYWRIGHT_TRACE=on \
+pnpm --filter @pretable/app-bench exec playwright test --workers=1
+```
+
+Trace file location: `apps/bench/test-results/*.zip` (or wherever the playwright config writes them). Copy + rename to `status/traces/2026-05-13-pretable-filter-text-perf.trace.zip`.
+
+**Size budget:** 5–25 MB. If >25 MB, see "Trace size handling" below.
+
+### Phase B — Trace analysis
+
+Open via `pnpm exec playwright show-trace status/traces/2026-05-13-pretable-filter-text-perf.trace.zip`. The Performance / Trace view shows the steady-state interaction window.
+
+Trigger-to-first-changed-frame window:
+
+1. Click on filter input (or programmatic event dispatch) — `trigger()` callback in `measureBenchInteractionRun`.
+2. React state update → re-render.
+3. `deriveVisibleRows` runs → produces new filtered+sorted row list.
+4. Virtualized list re-renders the visible window with the new list.
+5. DOM mutations → paint → first changed frame.
+
+For each long scripting task in the window:
+
+- Note the duration (ms).
+- Note the top-of-stack function name.
+- Note which phase it's in (derivation / reconciliation / virtualization / DOM update).
+
+The differential vs `scroll` (9.07 ms p95, well under budget) is the key: scroll doesn't run `deriveVisibleRows` on every frame, only on initial render. Interaction triggers a full re-derive. That's the likely difference.
+
+### Phase C — Memo
+
+`docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md`:
+
+```
+# Pretable wrapped-text filter perf diagnostic — 2026-05-13
+
+## Summary
+<1-2 sentences identifying the leading hypothesis>
+
+## Context
+- PR #134's n=20 verdicts (cite numbers).
+- Code structure summary (deriveVisibleRows + React render).
+
+## Method
+- Trace captured at `status/traces/...`.
+- Analysis: which scripting tasks dominate the trigger-to-first-frame window.
+
+## Trace findings
+- Longest scripting tasks with durations + function names + phase classification.
+- Specific call-stack snippets for the hotspot(s).
+
+## Hypothesis for the gap
+- Leading cause (e.g., "deriveVisibleRows takes 8 ms; the rest is React reconciliation and virtualization").
+- Why the other two scripts (sort, filter-metadata) likely share this cause.
+
+## Proposed fixes
+- One row per candidate: description, expected delta, risk, complexity.
+
+## Verdict
+- "Fix shipped in this PR — " / "Fix deferred to follow-up PR — ".
+```
+
+### Phase D (conditional) — Ship a fix
+
+Ship the fix only if ALL of these hold:
+
+1. **One obvious code change.** Single function or single file. No cross-cutting refactor.
+2. **Doesn't touch quality wedge.** Zero blank gaps, anchor stability, ≤1 px row-height fidelity must all stay intact.
+3. **Re-run matrix confirms the delta.** Pretable filter-text n=20 mean must move at least 2 ms lower with the fix in place, AND comparators' relative ratios stay 2-3× away from pretable.
+4. **No public-API change.** Internal-only edit.
+
+If any of these fail, the memo lists the candidate but the fix is queued to a separate PR. Document the deferral in Phase C's "Verdict" section.
+
+### Trace size handling
+
+If the trace exceeds 25 MB:
+
+- View locally with `playwright show-trace`.
+- Capture screenshots of the Performance panel (steady-state scroll window).
+- Save under `docs/research/2026-05-13-perf-diag-traces/`.
+- Note the local trace path in the memo without committing the binary.
+
+### Out-of-scope follow-ups
+
+- **Same-cause re-profile for sort + filter-metadata.** If filter-text's hotspot is `deriveVisibleRows`, the other two scripts should share the cause. The memo notes this; a fix PR can verify with focused matrix runs.
+- **Stochastic-variance investigation** for filter-metadata. PR #134 noted σ = 2.44 ms — much wider than filter-text's 0.31 ms. Could be initial-scroll-position dependence or row-selection state. Logged as a follow-up.
+- **`/bench` page hypothetical-fix-applied projection.** Out of scope; informational only.
+
+## Risks
+
+- **Trace doesn't surface a single dominant cause.** If the budget is consumed by many small contributions (React reconciliation across many cells, virtualization overhead, DOM mutations), there's no single "fix it here" answer. The memo records this honestly: verdict becomes "real-but-distributed; no single-cause hotspot; over-budget reflects 60Hz frame cost of full filter-rederive at this dataset size."
+- **Matrix-runner flake during Phase D verification.** Already well-documented; document actual sample count.
+- **Quality regression on a fix.** Phase D's gate condition #2 is explicit: any quality regression kills the fix. Verifying requires re-running the matrix's scroll + interaction scripts and confirming `blank_gap_frames`, `scroll_anchor_shift_backward_p95_px`, `row_height_error_p95_px` stay at their existing values.
+- **Public-API leak.** Phase D restricts to internal-only edits. If a fix would need to expose a new option (`autoBatchFilters: true`, etc.), the memo proposes it and defers to a follow-up PR for the design conversation.
+
+## Test plan
+
+- **Pre-fix:** matrix n=20 numbers as a baseline (already captured in PR #134 + PR #141; no new run needed unless the trace capture goes through the matrix path).
+- **Trace analysis:** manual review; no automated check.
+- **Post-fix (if Phase D fires):** re-run pretable-only matrix at n=20 for filter-text + (sort, filter-metadata) to confirm latency delta. Confirm `blank_gap_frames`, `scroll_anchor_shift_backward_p95_px`, `row_height_error_p95_px` unchanged.
+- **Repo-wide:** `pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format` clean.