From 06f51281c69b14259cdaa9a2ff3651d845fe9c75 Mon Sep 17 00:00:00 2001
From: Brian Love <brian@liveloveapp.com>
Date: Wed, 13 May 2026 10:46:58 -0700
Subject: [PATCH 1/5] docs(specs): bench-harness CDP tracing design

Opt-in CDP-level tracing path for apps/bench/tests/bench.spec.ts.
Unblocks future perf diagnostics that need flame-graph data (the
wrapped-text filter investigation from PR #142 hit this exact wall).
No matrix-runner integration; opt-in via PLAYWRIGHT_PERF_TRACE=1 env.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 ...-05-13-bench-harness-cdp-tracing-design.md | 137 ++++++++++++++++++
 1 file changed, 137 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md

diff --git a/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md b/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md
new file mode 100644
index 0000000..5577560
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md
@@ -0,0 +1,137 @@
+# Bench Harness CDP Tracing Design
+
+**Date:** 2026-05-13
+**Status:** Draft (awaiting user review before plan)
+**Predecessors:** [PR #142 wrapped-text filter perf-diag](../../research/repo-memory.md) — surfaced the bench-harness tracing gap that this PR closes.
+
+---
+
+## Goal
+
+Add an opt-in CDP-level tracing path to `apps/bench/tests/bench.spec.ts` so future perf diagnostics can capture function-level flame-graph data instead of Playwright's default action-trace format. Unblocks the deferred wrapped-text filter perf-fix investigation (memo at `docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md`) and any future hotspot-attribution work.
+
+Output: Chrome DevTools-compatible trace JSON loadable in the Performance panel via "Load profile…".
+
+## Why
+
+PR #142's perf-diag hit a hard tooling wall: Playwright's `tracing.start({ screenshots: true, snapshots: true })` (the only tracing the bench has wired today) produces API call frames + DOM snapshots + screencast — NOT a JS function timeline. To find why pretable's interaction scripts land 1–2 ms over the 16 ms single-frame budget, we need per-function timing.
+
+Playwright exposes the Chrome DevTools Protocol (CDP) directly via `page.context().newCDPSession(page)`. Sending `Tracing.start({ categories: '...' })` produces standard DevTools-format JSON — the same data Chrome's "Performance" panel records — that can be loaded for flame-graph analysis. This PR wires that path as an opt-in.
+
+This is infrastructure, not measurement. No bench numbers change.
+
+## Non-goals
+
+- **Re-doing the wrapped-text filter perf-fix investigation.** That's a separate follow-up PR that uses this new tracing path.
+- **Replacing the existing Playwright action trace.** The action trace is useful for visual debugging (screenshots + DOM snapshots); CDP tracing is additive, not a replacement.
+- **Always-on CDP tracing.** Opt-in via env var only. CDP tracing adds ~5–10× run overhead and produces large JSON files (tens of MB per run). Default-off keeps the matrix runs fast.
+- **Cross-browser CDP tracing.** Chromium only — CDP is a Chrome protocol. (The bench is Chromium-only anyway.)
+- **Visualization tooling.** Output is loaded into Chrome DevTools manually. No bundled trace viewer.
+- **Matrix runner integration.** The matrix runner runs many small invocations; CDP tracing per invocation would multiply wall-clock by 5–10×. Users opt in on individual Playwright runs, not full matrix sweeps.
+
+## Architecture
+
+### Env opt-in
+
+`PLAYWRIGHT_PERF_TRACE=1` env var triggers the CDP-tracing branch. When unset (the default), the spec runs as today — no behavioral or perf change.
+
+### Code changes
+
+`apps/bench/tests/bench.spec.ts`:
+
+1. Read `process.env.PLAYWRIGHT_PERF_TRACE === "1"` at spec setup.
+2. If set, after `page.goto(...)` but BEFORE the bench result becomes available, open a CDP session:
+   ```ts
+   const cdp = await page.context().newCDPSession(page);
+   await cdp.send("Tracing.start", {
+     categories: [
+       "disabled-by-default-devtools.timeline",
+       "disabled-by-default-devtools.timeline.frame",
+       "v8",
+       "disabled-by-default-v8.cpu_profiler",
+     ].join(","),
+     options: "sampling-frequency=10000",
+     transferMode: "ReturnAsStream",
+   });
+   ```
+3. Collect events into a buffer via the `Tracing.dataCollected` event listener.
+4. After the bench result is published, call `Tracing.end` and wait for `Tracing.tracingComplete`.
+5. If `transferMode: "ReturnAsStream"` is used, read the stream via `IO.read`. Otherwise, collect events from the `Tracing.dataCollected` buffer.
+6. Write the aggregated JSON to a sibling file alongside the Playwright trace zip:
+   ```
+   status/traces/<existing-stem>.cdp.json
+   ```
+   (Same stem as `createRunArtifactFileStem(result)` produces for the Playwright trace, just with `.cdp.json` suffix.)
+
+The CDP write happens INSIDE the test fixture, so it lands in the same per-run output cycle as the Playwright trace. Both are gitignored (`.cdp.json` follows the existing `status/traces/*` ignore rule).
+
+### Output shape
+
+The CDP `Tracing` API produces an array of event objects matching the Chrome DevTools "Trace Event Format" — the same format that DevTools' "Save profile…" / "Load profile…" round-trips. Wrap in `{"traceEvents": [...]}` so Chrome DevTools accepts the file directly.
+
+Example consumption (manual):
+1. Open Chrome → DevTools → Performance tab.
+2. Click "Load profile…" (folder icon).
+3. Select `status/traces/<stem>.cdp.json`.
+4. Flame graph renders. The window around the bench's interaction script is the focus.
+
+### Failure handling
+
+- **CDP session fails to attach** (rare; would indicate Playwright/Chromium version mismatch): log to test stderr, skip CDP tracing, let the Playwright run proceed normally. Don't fail the test.
+- **`Tracing.end` doesn't fire** (timeout): log a warning with the timeout duration; save whatever events were collected; don't fail the test.
+- **JSON write fails**: log; don't fail the test.
+
+The CDP-tracing path is best-effort additive — never blocks the underlying bench run.
+
+### Documentation
+
+A short section in `docs/research/repo-memory.md` covering:
+
+- How to opt in (`PLAYWRIGHT_PERF_TRACE=1`).
+- Where the output lands (`status/traces/<stem>.cdp.json`).
+- How to load in Chrome DevTools.
+- The categories chosen and why.
+- Pointer to the wrapped-text filter memo as the next intended consumer.
+
+No README changes; this is an internal-tooling extension, not a developer-facing feature.
+
+### Test coverage
+
+Adding a unit test for the CDP tracing path is awkward — it requires a running Chromium + CDP. Instead:
+
+- **Manual verification:** one CDP run captured, opened in DevTools, screenshot in the PR body.
+- **Regression guard:** confirm the env-unset path is byte-identical to current behavior. The existing spec test (`writes benchmark artifacts for the selected Pretable run`) must still pass with `PLAYWRIGHT_PERF_TRACE` unset.
+
+## File touches
+
+```
+apps/bench/tests/bench.spec.ts                            (MODIFY: add CDP-tracing branch)
+docs/research/repo-memory.md                              (MODIFY: 2026-05-13 entry — bench-harness CDP tracing)
+```
+
+No `packages/` changes. No public-API surface. No app source. Just one test file + a doc entry.
+
+## Risks
+
+- **CDP API drift.** Chromium's CDP version moves with the bundled Chromium. The trace event categories we pick should stay stable (they're standard DevTools categories) but if a future Chromium bump renames or drops a category, the env-set path silently produces an empty trace. Mitigation: the failure-handling section's logging would surface that. A follow-up could add a `traceEvents.length > 0` assertion.
+- **Trace file size.** A 3-second interaction window at 10 kHz sampling typically produces 5–30 MB of JSON. The gitignore already covers `status/traces/*`, so no commit pollution risk. Local disk fills up if a user runs many CDP traces — document that.
+- **Wall-clock overhead.** CDP tracing slows the bench's interaction window by 5–10×. The matrix runner's wall-clock budget doesn't allow always-on tracing; the env opt-in keeps it user-controlled.
+- **Race condition on `Tracing.tracingComplete`.** Playwright's CDP session is event-driven; we have to await the complete event after `Tracing.end`. If the await races with Playwright's test teardown, the JSON could be truncated. Mitigation: explicit `await` on a Promise that resolves on `tracingComplete`.
+- **No automated test for the new path.** As noted, requires a running browser. Manual verification in the PR body is the substitute. A future "bench-harness self-test" could exercise the CDP path; out of scope here.
+
+## Out of scope follow-ups
+
+- **Wrapped-text filter perf-fix investigation v2** — uses this new tracing path. Memo at `docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md` has the candidate fixes; flame-graph data confirms which one to ship.
+- **Bench-harness self-test for the CDP path** — would require a browser-in-CI setup beyond what we have. Manual verification is the current bar.
+- **Matrix runner CDP integration.** Per the spec's "always-on CDP tracing" non-goal, this would multiply matrix wall-clock 5–10×. Not done here.
+- **Visualization tooling beyond Chrome DevTools.** Speedscope or other flame-graph viewers can also consume DevTools-format JSON; documenting that is a small follow-up if a non-Chrome user wants to investigate.
+
+## Test plan
+
+- [x] `PLAYWRIGHT_PERF_TRACE` unset: existing spec test passes unchanged.
+- [x] `PLAYWRIGHT_PERF_TRACE=1`: spec produces both the Playwright `.trace.zip` and a sibling `.cdp.json`.
+- [x] `.cdp.json` is well-formed: opens in Chrome DevTools' Performance panel; flame graph renders.
+- [x] `pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format` clean.
+- [x] No `packages/` source touched.
+
+Manual verification: one CDP run captured by the implementer; screenshot of the loaded flame graph in the PR body.

From 69adf5d3666b43432ca45ba8a4529bf780c18fa8 Mon Sep 17 00:00:00 2001
From: Brian Love <brian@liveloveapp.com>
Date: Wed, 13 May 2026 10:47:47 -0700
Subject: [PATCH 2/5] docs(plans): bench-harness CDP tracing implementation
 plan

Four-task plan: wire CDP-tracing branch behind PLAYWRIGHT_PERF_TRACE=1
env opt-in, manual verification, repo-memory entry, gates + PR.
Auto-mergeable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../2026-05-13-bench-harness-cdp-tracing.md   | 238 ++++++++++++++++++
 1 file changed, 238 insertions(+)
 create mode 100644 docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md

diff --git a/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md b/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md
new file mode 100644
index 0000000..d29aa0f
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md
@@ -0,0 +1,238 @@
+# Bench Harness CDP Tracing Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Add an opt-in CDP-level tracing path to `apps/bench/tests/bench.spec.ts` so future perf diagnostics can capture Chrome DevTools-compatible JSON for flame-graph analysis.
+
+**Architecture:** Per the spec at `docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md`. Single PR. Auto-merge on green — tooling-only, no measurement, no public-API impact.
+
+**Working directory:** `/Users/blove/repos/pretable/.worktrees/bench-harness-cdp-tracing`.
+
+**Spec:** [`docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md`](../specs/2026-05-13-bench-harness-cdp-tracing-design.md)
+
+---
+
+## File Structure
+
+```
+apps/bench/tests/bench.spec.ts          (MODIFY: add CDP-tracing branch behind PLAYWRIGHT_PERF_TRACE=1 env)
+docs/research/repo-memory.md            (MODIFY: 2026-05-13 entry — bench-harness CDP tracing usage)
+```
+
+No `packages/` changes. No app source. No matrix-runner changes.
+
+---
+
+## Pre-flight
+
+- [ ] **0.1** Read the existing `apps/bench/tests/bench.spec.ts` to understand the current tracing wiring:
+  ```
+  cat apps/bench/tests/bench.spec.ts
+  ```
+  Confirm the spec already does `await page.context().tracing.start({ screenshots: true, snapshots: true })` early and `await page.context().tracing.stop({ path: tracePath })` later. The CDP-tracing path is additive — both traces run when opt-in is set.
+
+- [ ] **0.2** Free port 4173 if stale (`lsof -ti tcp:4173 | xargs -r kill -9`).
+
+---
+
+## Task 1 — Wire CDP-tracing branch in bench.spec.ts
+
+- [ ] **1.1** Open `apps/bench/tests/bench.spec.ts`. At the top of the test fixture, read the env opt-in:
+
+  ```ts
+  const perfTraceEnabled = process.env.PLAYWRIGHT_PERF_TRACE === "1";
+  ```
+
+- [ ] **1.2** After `await page.goto(...)` but BEFORE the bench-result wait, open a CDP session and start tracing if `perfTraceEnabled`:
+
+  ```ts
+  let cdpSession: Awaited<
+    ReturnType<typeof page.context.prototype.newCDPSession>
+  > | null = null;
+  const cdpEvents: unknown[] = [];
+
+  if (perfTraceEnabled) {
+    cdpSession = await page.context().newCDPSession(page);
+    cdpSession.on("Tracing.dataCollected", (payload: { value: unknown[] }) => {
+      for (const event of payload.value) cdpEvents.push(event);
+    });
+    await cdpSession.send("Tracing.start", {
+      categories: [
+        "disabled-by-default-devtools.timeline",
+        "disabled-by-default-devtools.timeline.frame",
+        "v8",
+        "disabled-by-default-v8.cpu_profiler",
+      ].join(","),
+      options: "sampling-frequency=10000",
+    });
+  }
+  ```
+
+  Note: omit `transferMode: "ReturnAsStream"` for simplicity — collect events via the `Tracing.dataCollected` listener instead. (The streaming path is more efficient for huge traces but the dataCollected path is simpler and works fine for 3-second windows.)
+
+- [ ] **1.3** After the bench result is captured (around the line `const result = await page.evaluate(...)`), stop CDP tracing and write the JSON:
+
+  ```ts
+  if (perfTraceEnabled && cdpSession) {
+    const tracingComplete = new Promise<void>((resolve) => {
+      cdpSession!.once("Tracing.tracingComplete", () => resolve());
+    });
+    await cdpSession.send("Tracing.end");
+    await tracingComplete;
+    const cdpPath = tracePath.replace(/\.trace\.zip$/, ".cdp.json");
+    await mkdir(path.dirname(cdpPath), { recursive: true });
+    await writeFile(
+      cdpPath,
+      JSON.stringify({ traceEvents: cdpEvents }, null, 0) + "\n",
+    );
+  }
+  ```
+
+  Place this BEFORE `await page.context().tracing.stop(...)` (the Playwright action trace closer) so the CDP session is cleaned up while the page is still around.
+
+- [ ] **1.4** Wrap the CDP block in a try/catch that logs but doesn't fail the test:
+
+  ```ts
+  try {
+    // ... CDP start/stop/write code
+  } catch (err) {
+    console.warn(
+      `[bench.spec] CDP tracing failed (best-effort, ignoring):`,
+      err,
+    );
+  }
+  ```
+
+  Apply the try/catch to BOTH the start block (Task 1.2) and the stop+write block (Task 1.3) — independently, so a failure starting tracing doesn't blow up the stop path, and vice versa.
+
+- [ ] **1.5** Verify Imports. The spec already imports `mkdir`, `readFile`, `stat`, `writeFile` from `"node:fs/promises"` and `path` from `"node:path"`. No new imports needed.
+
+- [ ] **1.6** Typecheck:
+  ```
+  pnpm --filter @pretable/app-bench typecheck
+  ```
+  Expected: passes.
+
+- [ ] **1.7** Run the existing test with `PLAYWRIGHT_PERF_TRACE` UNSET to confirm regression-free:
+  ```
+  pnpm bench:e2e --project=chromium
+  ```
+  Expected: spec passes; no `.cdp.json` produced; behavior identical to current `main`.
+
+- [ ] **1.8** Commit:
+  ```
+  git add apps/bench/tests/bench.spec.ts
+  git commit -m "feat(bench): opt-in CDP tracing in bench.spec.ts via PLAYWRIGHT_PERF_TRACE"
+  ```
+
+---
+
+## Task 2 — Manual verification
+
+- [ ] **2.1** Run with the env opt-in set:
+  ```
+  PLAYWRIGHT_PERF_TRACE=1 \
+    PRETABLE_BENCH_ADAPTER=pretable \
+    PRETABLE_BENCH_SCENARIO=S2 \
+    PRETABLE_BENCH_SCALE=hypothesis \
+    PRETABLE_BENCH_SCRIPT=filter-text \
+    pnpm --filter @pretable/app-bench exec playwright test --workers=1
+  ```
+
+- [ ] **2.2** Locate the produced JSON:
+  ```
+  ls -lt status/traces/*.cdp.json | head -1
+  ```
+  Expected: non-zero size (1–30 MB).
+
+- [ ] **2.3** Sanity-check the JSON:
+  ```
+  jq '.traceEvents | length' status/traces/*.cdp.json | head -1
+  ```
+  Expected: a number > 100 (a real trace has thousands of events; a low number would suggest the categories aren't producing what we expect).
+
+  Also check shape:
+  ```
+  jq '.traceEvents[0:3]' status/traces/*.cdp.json
+  ```
+  Expected: each event has `ph` (phase), `ts` (timestamp), `cat` (category), etc. — Chrome DevTools trace-event format.
+
+- [ ] **2.4** Open in Chrome DevTools:
+  - Open `chrome://devtools/` or any Chrome page → F12 → Performance tab.
+  - Click the "Load profile…" folder icon.
+  - Select the `.cdp.json` file.
+  - The flame graph should render with the interaction window visible.
+
+  Screenshot the flame graph for the PR body.
+
+- [ ] **2.5** Don't commit the captured trace (it's in the gitignored `status/traces/`).
+
+---
+
+## Task 3 — repo-memory entry
+
+- [ ] **3.1** Append a 2026-05-13 section to `docs/research/repo-memory.md`:
+
+  Cover:
+  - The new opt-in env (`PLAYWRIGHT_PERF_TRACE=1`).
+  - Where output lands (`status/traces/<stem>.cdp.json`).
+  - How to open it (Chrome DevTools → Performance → Load profile).
+  - Categories chosen + the standard "flame-graph" rationale.
+  - Pointer to the wrapped-text filter memo as the next intended consumer.
+  - Out-of-scope notes (no matrix integration, no Speedscope export).
+
+  Suggested heading: `### Bench-harness CDP tracing opt-in`.
+
+- [ ] **3.2** Commit:
+  ```
+  git add docs/research/repo-memory.md
+  git commit -m "docs(research): repo-memory entry — bench-harness CDP tracing"
+  ```
+
+---
+
+## Task 4 — Gates + PR
+
+- [ ] **4.1** Repo-wide gates:
+  ```
+  pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format
+  ```
+  Expected: all pass.
+
+- [ ] **4.2** Push + open PR:
+  ```
+  git push -u origin bench-harness-cdp-tracing
+  gh pr create --title "feat(bench): opt-in CDP tracing in bench.spec.ts (unblocks flame-graph perf diagnostics)" --body "..."
+  ```
+
+  PR body should include:
+  - Summary of the env opt-in.
+  - The categories chosen.
+  - Manual verification: screenshot of the DevTools flame graph for the captured CDP trace.
+  - What's NOT in the PR (matrix integration, Speedscope export, automated test for the new path).
+  - Pointer to the wrapped-text filter memo (PR #142) as the next intended consumer.
+
+- [ ] **4.3** Auto-merge:
+  ```
+  gh pr merge --auto --squash
+  ```
+  This is tooling-only; no public-API impact; no measurement changes. Safe to auto-merge.
+
+---
+
+## Self-review
+
+- Spec coverage: env opt-in (Task 1.1) → CDP start (1.2) → CDP stop + write (1.3) → failure handling (1.4) → manual verification (Task 2) → docs (Task 3) → gates + PR (Task 4). ✓
+- No placeholders.
+- Type/value consistency: `cdpSession` type uses `Awaited<ReturnType<...>>` so TypeScript can infer from Playwright's API; `cdpEvents: unknown[]` keeps the listener loose; output JSON shape `{ traceEvents: [...] }` matches Chrome DevTools' expected import format.
+- Scope: single PR, four task groups, ~3 commits-of-record, auto-mergeable.
+
+---
+
+## Notes for the implementer
+
+- Keep the diff to `bench.spec.ts` minimal. The Playwright action trace stays untouched; CDP tracing is fully additive.
+- The CDP `Tracing.dataCollected` listener fires multiple times per run — collect all batches into the `cdpEvents` array before writing the JSON.
+- Don't put the CDP code in a separate file. Inlining keeps the test fixture readable and avoids new module imports.
+- The "sampling-frequency=10000" option sets V8's CPU profiler to 10 kHz sampling. Standard for DevTools-style profiling; the resulting `.cdp.json` will be ~5–20 MB for a 3-second window.
+- If the env opt-in path fails locally (e.g., CDP session error), check that the local Playwright + Chromium versions match (`pnpm exec playwright --version`); the categories list is stable across recent Chromium versions but a major version change can affect it.

From 2a4fdb7e9fdb81ddcb817bdacafbd93f37dac3ca Mon Sep 17 00:00:00 2001
From: Brian Love <brian@liveloveapp.com>
Date: Wed, 13 May 2026 10:55:49 -0700
Subject: [PATCH 3/5] feat(bench): opt-in CDP tracing in bench.spec.ts via
 PLAYWRIGHT_PERF_TRACE

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 apps/bench/tests/bench.spec.ts | 62 ++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/apps/bench/tests/bench.spec.ts b/apps/bench/tests/bench.spec.ts
index 1e1c7c8..893f87b 100644
--- a/apps/bench/tests/bench.spec.ts
+++ b/apps/bench/tests/bench.spec.ts
@@ -8,6 +8,8 @@ import {
   type BenchRunSummary,
 } from "@pretable-internal/bench-runner";
 
+const perfTraceEnabled = process.env.PLAYWRIGHT_PERF_TRACE === "1";
+
 const adapterId = process.env.PRETABLE_BENCH_ADAPTER ?? "pretable";
 const scale = process.env.PRETABLE_BENCH_SCALE ?? "dev";
 const scenarioId = process.env.PRETABLE_BENCH_SCENARIO ?? "S1";
@@ -39,10 +41,70 @@ test("writes benchmark artifacts for the selected Pretable run", async ({
 
   await expect(page.getByLabel(adapterLabel).first()).toBeVisible();
 
+  let cdpSession: Awaited<
+    ReturnType<typeof page.context.prototype.newCDPSession>
+  > | null = null;
+  const cdpEvents: unknown[] = [];
+
+  if (perfTraceEnabled) {
+    try {
+      cdpSession = await page.context().newCDPSession(page);
+      cdpSession.on(
+        "Tracing.dataCollected",
+        (payload: { value: unknown[] }) => {
+          for (const event of payload.value) cdpEvents.push(event);
+        },
+      );
+      await cdpSession.send("Tracing.start", {
+        categories: [
+          "disabled-by-default-devtools.timeline",
+          "disabled-by-default-devtools.timeline.frame",
+          "v8",
+          "disabled-by-default-v8.cpu_profiler",
+        ].join(","),
+        options: "sampling-frequency=10000",
+      });
+    } catch (err) {
+      console.warn(
+        `[bench.spec] CDP tracing start failed (best-effort, ignoring):`,
+        err,
+      );
+      cdpSession = null;
+    }
+  }
+
   await page.waitForFunction(() => Boolean(window.__PRETABLE_BENCH_RESULT__));
 
   const result = await page.evaluate(() => window.__PRETABLE_BENCH_RESULT__);
 
+  if (perfTraceEnabled && cdpSession) {
+    try {
+      const session = cdpSession;
+      const tracingComplete = new Promise<void>((resolve) => {
+        session.once("Tracing.tracingComplete", () => resolve());
+      });
+      await session.send("Tracing.end");
+      await tracingComplete;
+      const traceRelPath =
+        typeof result?.tracePath === "string"
+          ? result.tracePath
+          : `status/traces/${createRunArtifactFileStem(result)}.trace.zip`;
+      const cdpPath = path
+        .join(process.cwd(), traceRelPath)
+        .replace(/\.trace\.zip$/, ".cdp.json");
+      await mkdir(path.dirname(cdpPath), { recursive: true });
+      await writeFile(
+        cdpPath,
+        JSON.stringify({ traceEvents: cdpEvents }, null, 0) + "\n",
+      );
+    } catch (err) {
+      console.warn(
+        `[bench.spec] CDP tracing stop/write failed (best-effort, ignoring):`,
+        err,
+      );
+    }
+  }
+
   const interactionScript =
     scriptName === "sort" ||
     scriptName === "filter-metadata" ||

From 7cb8acb261fe546e809fa6cb75f60d924af9235e Mon Sep 17 00:00:00 2001
From: Brian Love <brian@liveloveapp.com>
Date: Wed, 13 May 2026 11:07:12 -0700
Subject: [PATCH 4/5] =?UTF-8?q?docs(research):=20repo-memory=20entry=20?=
 =?UTF-8?q?=E2=80=94=20bench-harness=20CDP=20tracing?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 docs/research/repo-memory.md | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/docs/research/repo-memory.md b/docs/research/repo-memory.md
index 25daa83..cf4aedf 100644
--- a/docs/research/repo-memory.md
+++ b/docs/research/repo-memory.md
@@ -533,8 +533,35 @@ Header docblock updated to cite the n=20 milestone source (`status/milestones/20
 
 No test changes required: the TanStack regex (`/headless.*slower interaction/i`) still matches the shortened label.
 
+### Bench-harness CDP tracing opt-in
+
+Wired `apps/bench/tests/bench.spec.ts` for opt-in Chrome DevTools-format trace capture, closing the tooling gap surfaced by PR #142's wrapped-text filter perf-diag (Playwright's default `tracing.start({ screenshots, snapshots })` produces an SDK action trace, not a JS flame graph).
+
+**How to use:**
+
+```
+PLAYWRIGHT_PERF_TRACE=1 \
+  PRETABLE_BENCH_ADAPTER=pretable \
+  PRETABLE_BENCH_SCENARIO=S2 \
+  PRETABLE_BENCH_SCALE=hypothesis \
+  PRETABLE_BENCH_SCRIPT=filter-text \
+  pnpm --filter @pretable/app-bench exec playwright test --workers=1
+```
+
+Output: `status/traces/<stem>.cdp.json` (sibling to the Playwright `.trace.zip`). Gitignored.
+
+**Loading:** Chrome → DevTools → Performance tab → "Load profile…" (folder icon) → select the `.cdp.json`. Flame graph renders.
+
+**Categories captured:** `disabled-by-default-devtools.timeline`, `disabled-by-default-devtools.timeline.frame`, `v8`, `disabled-by-default-v8.cpu_profiler` (sampling-frequency=10000). Standard DevTools profiling set.
+
+**Failure handling:** Best-effort additive. CDP attach failures, end timeouts, and JSON write failures all log + continue — the bench run never fails because of CDP tracing.
+
+**Known limitation (consumer side, not the harness):** The bench app starts its interaction script on page-load. CDP attach happens after the adapter-label visibility check, so the very first moments of the interaction may run before tracing starts. For hypothesis-scale runs the captured window can be sparse — observed ~145 events / 30 KB for filter-text/hypothesis. For meaningful flame-graph data, prefer wider windows (larger scale, or a future triggered-start variant in the bench app). Out of scope for this harness PR; the wiring + format are correct.
+
+**Out of scope here:** matrix-runner integration (would multiply wall-clock 5–10×), Speedscope export, automated test for the CDP path. Documented in the spec.
+
 ### Open follow-ups
 
-- **Pretable wrapped-text filter perf-fix investigation** — next item; profiling + scope.
+- **Pretable wrapped-text filter perf-fix investigation** — next item; profiling + scope. Tooling now unblocked; remaining blocker is the bench-app interaction-start timing noted above.
 - **`/bench` page swap to read from `hypotheses.json` directly** — still deferred; aggregator scripts continue feeding the page for now.
 - **Matrix-runner reliability** — flakes are now well-documented across PRs #133, #134, #140, and this PR's sort re-run (which succeeded for pretable-only, but the multi-adapter runner remains fragile).

From 36cc0b21a328b7ac643422e0006b58785db85c1d Mon Sep 17 00:00:00 2001
From: Brian Love <brian@liveloveapp.com>
Date: Wed, 13 May 2026 11:13:16 -0700
Subject: [PATCH 5/5] chore: prettier format spec + plan

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../2026-05-13-bench-harness-cdp-tracing.md      | 16 ++++++++++++++++
 ...026-05-13-bench-harness-cdp-tracing-design.md |  1 +
 2 files changed, 17 insertions(+)

diff --git a/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md b/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md
index d29aa0f..00e1102 100644
--- a/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md
+++ b/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md
@@ -26,9 +26,11 @@ No `packages/` changes. No app source. No matrix-runner changes.
 ## Pre-flight
 
 - [ ] **0.1** Read the existing `apps/bench/tests/bench.spec.ts` to understand the current tracing wiring:
+
   ```
   cat apps/bench/tests/bench.spec.ts
   ```
+
   Confirm the spec already does `await page.context().tracing.start({ screenshots: true, snapshots: true })` early and `await page.context().tracing.stop({ path: tracePath })` later. The CDP-tracing path is additive — both traces run when opt-in is set.
 
 - [ ] **0.2** Free port 4173 if stale (`lsof -ti tcp:4173 | xargs -r kill -9`).
@@ -108,15 +110,19 @@ No `packages/` changes. No app source. No matrix-runner changes.
 - [ ] **1.5** Verify Imports. The spec already imports `mkdir`, `readFile`, `stat`, `writeFile` from `"node:fs/promises"` and `path` from `"node:path"`. No new imports needed.
 
 - [ ] **1.6** Typecheck:
+
   ```
   pnpm --filter @pretable/app-bench typecheck
   ```
+
   Expected: passes.
 
 - [ ] **1.7** Run the existing test with `PLAYWRIGHT_PERF_TRACE` UNSET to confirm regression-free:
+
   ```
   pnpm bench:e2e --project=chromium
   ```
+
   Expected: spec passes; no `.cdp.json` produced; behavior identical to current `main`.
 
 - [ ] **1.8** Commit:
@@ -130,6 +136,7 @@ No `packages/` changes. No app source. No matrix-runner changes.
 ## Task 2 — Manual verification
 
 - [ ] **2.1** Run with the env opt-in set:
+
   ```
   PLAYWRIGHT_PERF_TRACE=1 \
     PRETABLE_BENCH_ADAPTER=pretable \
@@ -140,21 +147,27 @@ No `packages/` changes. No app source. No matrix-runner changes.
   ```
 
 - [ ] **2.2** Locate the produced JSON:
+
   ```
   ls -lt status/traces/*.cdp.json | head -1
   ```
+
   Expected: non-zero size (1–30 MB).
 
 - [ ] **2.3** Sanity-check the JSON:
+
   ```
   jq '.traceEvents | length' status/traces/*.cdp.json | head -1
   ```
+
   Expected: a number > 100 (a real trace has thousands of events; a low number would suggest the categories aren't producing what we expect).
 
   Also check shape:
+
   ```
   jq '.traceEvents[0:3]' status/traces/*.cdp.json
   ```
+
   Expected: each event has `ph` (phase), `ts` (timestamp), `cat` (category), etc. — Chrome DevTools trace-event format.
 
 - [ ] **2.4** Open in Chrome DevTools:
@@ -194,12 +207,15 @@ No `packages/` changes. No app source. No matrix-runner changes.
 ## Task 4 — Gates + PR
 
 - [ ] **4.1** Repo-wide gates:
+
   ```
   pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format
   ```
+
   Expected: all pass.
 
 - [ ] **4.2** Push + open PR:
+
   ```
   git push -u origin bench-harness-cdp-tracing
   gh pr create --title "feat(bench): opt-in CDP tracing in bench.spec.ts (unblocks flame-graph perf diagnostics)" --body "..."
diff --git a/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md b/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md
index 5577560..24ab39e 100644
--- a/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md
+++ b/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md
@@ -70,6 +70,7 @@ The CDP write happens INSIDE the test fixture, so it lands in the same per-run o
 The CDP `Tracing` API produces an array of event objects matching the Chrome DevTools "Trace Event Format" — the same format that DevTools' "Save profile…" / "Load profile…" round-trips. Wrap in `{"traceEvents": [...]}` so Chrome DevTools accepts the file directly.
 
 Example consumption (manual):
+
 1. Open Chrome → DevTools → Performance tab.
 2. Click "Load profile…" (folder icon).
 3. Select `status/traces/<stem>.cdp.json`.