From 06f51281c69b14259cdaa9a2ff3651d845fe9c75 Mon Sep 17 00:00:00 2001 From: Brian Love Date: Wed, 13 May 2026 10:46:58 -0700 Subject: [PATCH 1/5] docs(specs): bench-harness CDP tracing design Opt-in CDP-level tracing path for apps/bench/tests/bench.spec.ts. Unblocks future perf diagnostics that need flame-graph data (the wrapped-text filter investigation from PR #142 hit this exact wall). No matrix-runner integration; opt-in via PLAYWRIGHT_PERF_TRACE=1 env. Co-Authored-By: Claude Opus 4.7 --- ...-05-13-bench-harness-cdp-tracing-design.md | 137 ++++++++++++++++++ 1 file changed, 137 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md diff --git a/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md b/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md new file mode 100644 index 0000000..5577560 --- /dev/null +++ b/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md @@ -0,0 +1,137 @@ +# Bench Harness CDP Tracing Design + +**Date:** 2026-05-13 +**Status:** Draft (awaiting user review before plan) +**Predecessors:** [PR #142 wrapped-text filter perf-diag](../../research/repo-memory.md) — surfaced the bench-harness tracing gap that this PR closes. + +--- + +## Goal + +Add an opt-in CDP-level tracing path to `apps/bench/tests/bench.spec.ts` so future perf diagnostics can capture function-level flame-graph data instead of Playwright's default action-trace format. Unblocks the deferred wrapped-text filter perf-fix investigation (memo at `docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md`) and any future hotspot-attribution work. + +Output: Chrome DevTools-compatible trace JSON loadable in the Performance panel via "Load profile…". + +## Why + +PR #142's perf-diag hit a hard tooling wall: Playwright's `tracing.start({ screenshots: true, snapshots: true })` (the only tracing the bench has wired today) produces API call frames + DOM snapshots + screencast — NOT a JS function timeline. To find why pretable's interaction scripts land 1–2 ms over the 16 ms single-frame budget, we need per-function timing. + +Playwright exposes the Chrome DevTools Protocol (CDP) directly via `page.context().newCDPSession(page)`. Sending `Tracing.start({ categories: '...' })` produces standard DevTools-format JSON — the same data Chrome's "Performance" panel records — that can be loaded for flame-graph analysis. This PR wires that path as an opt-in. + +This is infrastructure, not measurement. No bench numbers change. + +## Non-goals + +- **Re-doing the wrapped-text filter perf-fix investigation.** That's a separate follow-up PR that uses this new tracing path. +- **Replacing the existing Playwright action trace.** The action trace is useful for visual debugging (screenshots + DOM snapshots); CDP tracing is additive, not a replacement. +- **Always-on CDP tracing.** Opt-in via env var only. CDP tracing adds ~5–10× run overhead and produces large JSON files (tens of MB per run). Default-off keeps the matrix runs fast. +- **Cross-browser CDP tracing.** Chromium only — CDP is a Chrome protocol. (The bench is Chromium-only anyway.) +- **Visualization tooling.** Output is loaded into Chrome DevTools manually. No bundled trace viewer. +- **Matrix runner integration.** The matrix runner runs many small invocations; CDP tracing per invocation would multiply wall-clock by 5–10×. Users opt in on individual Playwright runs, not full matrix sweeps. + +## Architecture + +### Env opt-in + +`PLAYWRIGHT_PERF_TRACE=1` env var triggers the CDP-tracing branch. When unset (the default), the spec runs as today — no behavioral or perf change. + +### Code changes + +`apps/bench/tests/bench.spec.ts`: + +1. Read `process.env.PLAYWRIGHT_PERF_TRACE === "1"` at spec setup. +2. If set, after `page.goto(...)` but BEFORE the bench result becomes available, open a CDP session: + ```ts + const cdp = await page.context().newCDPSession(page); + await cdp.send("Tracing.start", { + categories: [ + "disabled-by-default-devtools.timeline", + "disabled-by-default-devtools.timeline.frame", + "v8", + "disabled-by-default-v8.cpu_profiler", + ].join(","), + options: "sampling-frequency=10000", + transferMode: "ReturnAsStream", + }); + ``` +3. Collect events into a buffer via the `Tracing.dataCollected` event listener. +4. After the bench result is published, call `Tracing.end` and wait for `Tracing.tracingComplete`. +5. If `transferMode: "ReturnAsStream"` is used, read the stream via `IO.read`. Otherwise, collect events from the `Tracing.dataCollected` buffer. +6. Write the aggregated JSON to a sibling file alongside the Playwright trace zip: + ``` + status/traces/.cdp.json + ``` + (Same stem as `createRunArtifactFileStem(result)` produces for the Playwright trace, just with `.cdp.json` suffix.) + +The CDP write happens INSIDE the test fixture, so it lands in the same per-run output cycle as the Playwright trace. Both are gitignored (`.cdp.json` follows the existing `status/traces/*` ignore rule). + +### Output shape + +The CDP `Tracing` API produces an array of event objects matching the Chrome DevTools "Trace Event Format" — the same format that DevTools' "Save profile…" / "Load profile…" round-trips. Wrap in `{"traceEvents": [...]}` so Chrome DevTools accepts the file directly. + +Example consumption (manual): +1. Open Chrome → DevTools → Performance tab. +2. Click "Load profile…" (folder icon). +3. Select `status/traces/.cdp.json`. +4. Flame graph renders. The window around the bench's interaction script is the focus. + +### Failure handling + +- **CDP session fails to attach** (rare; would indicate Playwright/Chromium version mismatch): log to test stderr, skip CDP tracing, let the Playwright run proceed normally. Don't fail the test. +- **`Tracing.end` doesn't fire** (timeout): log a warning with the timeout duration; save whatever events were collected; don't fail the test. +- **JSON write fails**: log; don't fail the test. + +The CDP-tracing path is best-effort additive — never blocks the underlying bench run. + +### Documentation + +A short section in `docs/research/repo-memory.md` covering: + +- How to opt in (`PLAYWRIGHT_PERF_TRACE=1`). +- Where the output lands (`status/traces/.cdp.json`). +- How to load in Chrome DevTools. +- The categories chosen and why. +- Pointer to the wrapped-text filter memo as the next intended consumer. + +No README changes; this is an internal-tooling extension, not a developer-facing feature. + +### Test coverage + +Adding a unit test for the CDP tracing path is awkward — it requires a running Chromium + CDP. Instead: + +- **Manual verification:** one CDP run captured, opened in DevTools, screenshot in the PR body. +- **Regression guard:** confirm the env-unset path is byte-identical to current behavior. The existing spec test (`writes benchmark artifacts for the selected Pretable run`) must still pass with `PLAYWRIGHT_PERF_TRACE` unset. + +## File touches + +``` +apps/bench/tests/bench.spec.ts (MODIFY: add CDP-tracing branch) +docs/research/repo-memory.md (MODIFY: 2026-05-13 entry — bench-harness CDP tracing) +``` + +No `packages/` changes. No public-API surface. No app source. Just one test file + a doc entry. + +## Risks + +- **CDP API drift.** Chromium's CDP version moves with the bundled Chromium. The trace event categories we pick should stay stable (they're standard DevTools categories) but if a future Chromium bump renames or drops a category, the env-set path silently produces an empty trace. Mitigation: the failure-handling section's logging would surface that. A follow-up could add a `traceEvents.length > 0` assertion. +- **Trace file size.** A 3-second interaction window at 10 kHz sampling typically produces 5–30 MB of JSON. The gitignore already covers `status/traces/*`, so no commit pollution risk. Local disk fills up if a user runs many CDP traces — document that. +- **Wall-clock overhead.** CDP tracing slows the bench's interaction window by 5–10×. The matrix runner's wall-clock budget doesn't allow always-on tracing; the env opt-in keeps it user-controlled. +- **Race condition on `Tracing.tracingComplete`.** Playwright's CDP session is event-driven; we have to await the complete event after `Tracing.end`. If the await races with Playwright's test teardown, the JSON could be truncated. Mitigation: explicit `await` on a Promise that resolves on `tracingComplete`. +- **No automated test for the new path.** As noted, requires a running browser. Manual verification in the PR body is the substitute. A future "bench-harness self-test" could exercise the CDP path; out of scope here. + +## Out of scope follow-ups + +- **Wrapped-text filter perf-fix investigation v2** — uses this new tracing path. Memo at `docs/research/2026-05-13-pretable-wrapped-text-filter-perf-diagnostic.md` has the candidate fixes; flame-graph data confirms which one to ship. +- **Bench-harness self-test for the CDP path** — would require a browser-in-CI setup beyond what we have. Manual verification is the current bar. +- **Matrix runner CDP integration.** Per the spec's "always-on CDP tracing" non-goal, this would multiply matrix wall-clock 5–10×. Not done here. +- **Visualization tooling beyond Chrome DevTools.** Speedscope or other flame-graph viewers can also consume DevTools-format JSON; documenting that is a small follow-up if a non-Chrome user wants to investigate. + +## Test plan + +- [x] `PLAYWRIGHT_PERF_TRACE` unset: existing spec test passes unchanged. +- [x] `PLAYWRIGHT_PERF_TRACE=1`: spec produces both the Playwright `.trace.zip` and a sibling `.cdp.json`. +- [x] `.cdp.json` is well-formed: opens in Chrome DevTools' Performance panel; flame graph renders. +- [x] `pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format` clean. +- [x] No `packages/` source touched. + +Manual verification: one CDP run captured by the implementer; screenshot of the loaded flame graph in the PR body. From 69adf5d3666b43432ca45ba8a4529bf780c18fa8 Mon Sep 17 00:00:00 2001 From: Brian Love Date: Wed, 13 May 2026 10:47:47 -0700 Subject: [PATCH 2/5] docs(plans): bench-harness CDP tracing implementation plan Four-task plan: wire CDP-tracing branch behind PLAYWRIGHT_PERF_TRACE=1 env opt-in, manual verification, repo-memory entry, gates + PR. Auto-mergeable. Co-Authored-By: Claude Opus 4.7 --- .../2026-05-13-bench-harness-cdp-tracing.md | 238 ++++++++++++++++++ 1 file changed, 238 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md diff --git a/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md b/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md new file mode 100644 index 0000000..d29aa0f --- /dev/null +++ b/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md @@ -0,0 +1,238 @@ +# Bench Harness CDP Tracing Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add an opt-in CDP-level tracing path to `apps/bench/tests/bench.spec.ts` so future perf diagnostics can capture Chrome DevTools-compatible JSON for flame-graph analysis. + +**Architecture:** Per the spec at `docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md`. Single PR. Auto-merge on green — tooling-only, no measurement, no public-API impact. + +**Working directory:** `/Users/blove/repos/pretable/.worktrees/bench-harness-cdp-tracing`. + +**Spec:** [`docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md`](../specs/2026-05-13-bench-harness-cdp-tracing-design.md) + +--- + +## File Structure + +``` +apps/bench/tests/bench.spec.ts (MODIFY: add CDP-tracing branch behind PLAYWRIGHT_PERF_TRACE=1 env) +docs/research/repo-memory.md (MODIFY: 2026-05-13 entry — bench-harness CDP tracing usage) +``` + +No `packages/` changes. No app source. No matrix-runner changes. + +--- + +## Pre-flight + +- [ ] **0.1** Read the existing `apps/bench/tests/bench.spec.ts` to understand the current tracing wiring: + ``` + cat apps/bench/tests/bench.spec.ts + ``` + Confirm the spec already does `await page.context().tracing.start({ screenshots: true, snapshots: true })` early and `await page.context().tracing.stop({ path: tracePath })` later. The CDP-tracing path is additive — both traces run when opt-in is set. + +- [ ] **0.2** Free port 4173 if stale (`lsof -ti tcp:4173 | xargs -r kill -9`). + +--- + +## Task 1 — Wire CDP-tracing branch in bench.spec.ts + +- [ ] **1.1** Open `apps/bench/tests/bench.spec.ts`. At the top of the test fixture, read the env opt-in: + + ```ts + const perfTraceEnabled = process.env.PLAYWRIGHT_PERF_TRACE === "1"; + ``` + +- [ ] **1.2** After `await page.goto(...)` but BEFORE the bench-result wait, open a CDP session and start tracing if `perfTraceEnabled`: + + ```ts + let cdpSession: Awaited< + ReturnType + > | null = null; + const cdpEvents: unknown[] = []; + + if (perfTraceEnabled) { + cdpSession = await page.context().newCDPSession(page); + cdpSession.on("Tracing.dataCollected", (payload: { value: unknown[] }) => { + for (const event of payload.value) cdpEvents.push(event); + }); + await cdpSession.send("Tracing.start", { + categories: [ + "disabled-by-default-devtools.timeline", + "disabled-by-default-devtools.timeline.frame", + "v8", + "disabled-by-default-v8.cpu_profiler", + ].join(","), + options: "sampling-frequency=10000", + }); + } + ``` + + Note: omit `transferMode: "ReturnAsStream"` for simplicity — collect events via the `Tracing.dataCollected` listener instead. (The streaming path is more efficient for huge traces but the dataCollected path is simpler and works fine for 3-second windows.) + +- [ ] **1.3** After the bench result is captured (around the line `const result = await page.evaluate(...)`), stop CDP tracing and write the JSON: + + ```ts + if (perfTraceEnabled && cdpSession) { + const tracingComplete = new Promise((resolve) => { + cdpSession!.once("Tracing.tracingComplete", () => resolve()); + }); + await cdpSession.send("Tracing.end"); + await tracingComplete; + const cdpPath = tracePath.replace(/\.trace\.zip$/, ".cdp.json"); + await mkdir(path.dirname(cdpPath), { recursive: true }); + await writeFile( + cdpPath, + JSON.stringify({ traceEvents: cdpEvents }, null, 0) + "\n", + ); + } + ``` + + Place this BEFORE `await page.context().tracing.stop(...)` (the Playwright action trace closer) so the CDP session is cleaned up while the page is still around. + +- [ ] **1.4** Wrap the CDP block in a try/catch that logs but doesn't fail the test: + + ```ts + try { + // ... CDP start/stop/write code + } catch (err) { + console.warn( + `[bench.spec] CDP tracing failed (best-effort, ignoring):`, + err, + ); + } + ``` + + Apply the try/catch to BOTH the start block (Task 1.2) and the stop+write block (Task 1.3) — independently, so a failure starting tracing doesn't blow up the stop path, and vice versa. + +- [ ] **1.5** Verify Imports. The spec already imports `mkdir`, `readFile`, `stat`, `writeFile` from `"node:fs/promises"` and `path` from `"node:path"`. No new imports needed. + +- [ ] **1.6** Typecheck: + ``` + pnpm --filter @pretable/app-bench typecheck + ``` + Expected: passes. + +- [ ] **1.7** Run the existing test with `PLAYWRIGHT_PERF_TRACE` UNSET to confirm regression-free: + ``` + pnpm bench:e2e --project=chromium + ``` + Expected: spec passes; no `.cdp.json` produced; behavior identical to current `main`. + +- [ ] **1.8** Commit: + ``` + git add apps/bench/tests/bench.spec.ts + git commit -m "feat(bench): opt-in CDP tracing in bench.spec.ts via PLAYWRIGHT_PERF_TRACE" + ``` + +--- + +## Task 2 — Manual verification + +- [ ] **2.1** Run with the env opt-in set: + ``` + PLAYWRIGHT_PERF_TRACE=1 \ + PRETABLE_BENCH_ADAPTER=pretable \ + PRETABLE_BENCH_SCENARIO=S2 \ + PRETABLE_BENCH_SCALE=hypothesis \ + PRETABLE_BENCH_SCRIPT=filter-text \ + pnpm --filter @pretable/app-bench exec playwright test --workers=1 + ``` + +- [ ] **2.2** Locate the produced JSON: + ``` + ls -lt status/traces/*.cdp.json | head -1 + ``` + Expected: non-zero size (1–30 MB). + +- [ ] **2.3** Sanity-check the JSON: + ``` + jq '.traceEvents | length' status/traces/*.cdp.json | head -1 + ``` + Expected: a number > 100 (a real trace has thousands of events; a low number would suggest the categories aren't producing what we expect). + + Also check shape: + ``` + jq '.traceEvents[0:3]' status/traces/*.cdp.json + ``` + Expected: each event has `ph` (phase), `ts` (timestamp), `cat` (category), etc. — Chrome DevTools trace-event format. + +- [ ] **2.4** Open in Chrome DevTools: + - Open `chrome://devtools/` or any Chrome page → F12 → Performance tab. + - Click the "Load profile…" folder icon. + - Select the `.cdp.json` file. + - The flame graph should render with the interaction window visible. + + Screenshot the flame graph for the PR body. + +- [ ] **2.5** Don't commit the captured trace (it's in the gitignored `status/traces/`). + +--- + +## Task 3 — repo-memory entry + +- [ ] **3.1** Append a 2026-05-13 section to `docs/research/repo-memory.md`: + + Cover: + - The new opt-in env (`PLAYWRIGHT_PERF_TRACE=1`). + - Where output lands (`status/traces/.cdp.json`). + - How to open it (Chrome DevTools → Performance → Load profile). + - Categories chosen + the standard "flame-graph" rationale. + - Pointer to the wrapped-text filter memo as the next intended consumer. + - Out-of-scope notes (no matrix integration, no Speedscope export). + + Suggested heading: `### Bench-harness CDP tracing opt-in`. + +- [ ] **3.2** Commit: + ``` + git add docs/research/repo-memory.md + git commit -m "docs(research): repo-memory entry — bench-harness CDP tracing" + ``` + +--- + +## Task 4 — Gates + PR + +- [ ] **4.1** Repo-wide gates: + ``` + pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format + ``` + Expected: all pass. + +- [ ] **4.2** Push + open PR: + ``` + git push -u origin bench-harness-cdp-tracing + gh pr create --title "feat(bench): opt-in CDP tracing in bench.spec.ts (unblocks flame-graph perf diagnostics)" --body "..." + ``` + + PR body should include: + - Summary of the env opt-in. + - The categories chosen. + - Manual verification: screenshot of the DevTools flame graph for the captured CDP trace. + - What's NOT in the PR (matrix integration, Speedscope export, automated test for the new path). + - Pointer to the wrapped-text filter memo (PR #142) as the next intended consumer. + +- [ ] **4.3** Auto-merge: + ``` + gh pr merge --auto --squash + ``` + This is tooling-only; no public-API impact; no measurement changes. Safe to auto-merge. + +--- + +## Self-review + +- Spec coverage: env opt-in (Task 1.1) → CDP start (1.2) → CDP stop + write (1.3) → failure handling (1.4) → manual verification (Task 2) → docs (Task 3) → gates + PR (Task 4). ✓ +- No placeholders. +- Type/value consistency: `cdpSession` type uses `Awaited>` so TypeScript can infer from Playwright's API; `cdpEvents: unknown[]` keeps the listener loose; output JSON shape `{ traceEvents: [...] }` matches Chrome DevTools' expected import format. +- Scope: single PR, four task groups, ~3 commits-of-record, auto-mergeable. + +--- + +## Notes for the implementer + +- Keep the diff to `bench.spec.ts` minimal. The Playwright action trace stays untouched; CDP tracing is fully additive. +- The CDP `Tracing.dataCollected` listener fires multiple times per run — collect all batches into the `cdpEvents` array before writing the JSON. +- Don't put the CDP code in a separate file. Inlining keeps the test fixture readable and avoids new module imports. +- The "sampling-frequency=10000" option sets V8's CPU profiler to 10 kHz sampling. Standard for DevTools-style profiling; the resulting `.cdp.json` will be ~5–20 MB for a 3-second window. +- If the env opt-in path fails locally (e.g., CDP session error), check that the local Playwright + Chromium versions match (`pnpm exec playwright --version`); the categories list is stable across recent Chromium versions but a major version change can affect it. From 2a4fdb7e9fdb81ddcb817bdacafbd93f37dac3ca Mon Sep 17 00:00:00 2001 From: Brian Love Date: Wed, 13 May 2026 10:55:49 -0700 Subject: [PATCH 3/5] feat(bench): opt-in CDP tracing in bench.spec.ts via PLAYWRIGHT_PERF_TRACE Co-Authored-By: Claude Opus 4.7 --- apps/bench/tests/bench.spec.ts | 62 ++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/apps/bench/tests/bench.spec.ts b/apps/bench/tests/bench.spec.ts index 1e1c7c8..893f87b 100644 --- a/apps/bench/tests/bench.spec.ts +++ b/apps/bench/tests/bench.spec.ts @@ -8,6 +8,8 @@ import { type BenchRunSummary, } from "@pretable-internal/bench-runner"; +const perfTraceEnabled = process.env.PLAYWRIGHT_PERF_TRACE === "1"; + const adapterId = process.env.PRETABLE_BENCH_ADAPTER ?? "pretable"; const scale = process.env.PRETABLE_BENCH_SCALE ?? "dev"; const scenarioId = process.env.PRETABLE_BENCH_SCENARIO ?? "S1"; @@ -39,10 +41,70 @@ test("writes benchmark artifacts for the selected Pretable run", async ({ await expect(page.getByLabel(adapterLabel).first()).toBeVisible(); + let cdpSession: Awaited< + ReturnType + > | null = null; + const cdpEvents: unknown[] = []; + + if (perfTraceEnabled) { + try { + cdpSession = await page.context().newCDPSession(page); + cdpSession.on( + "Tracing.dataCollected", + (payload: { value: unknown[] }) => { + for (const event of payload.value) cdpEvents.push(event); + }, + ); + await cdpSession.send("Tracing.start", { + categories: [ + "disabled-by-default-devtools.timeline", + "disabled-by-default-devtools.timeline.frame", + "v8", + "disabled-by-default-v8.cpu_profiler", + ].join(","), + options: "sampling-frequency=10000", + }); + } catch (err) { + console.warn( + `[bench.spec] CDP tracing start failed (best-effort, ignoring):`, + err, + ); + cdpSession = null; + } + } + await page.waitForFunction(() => Boolean(window.__PRETABLE_BENCH_RESULT__)); const result = await page.evaluate(() => window.__PRETABLE_BENCH_RESULT__); + if (perfTraceEnabled && cdpSession) { + try { + const session = cdpSession; + const tracingComplete = new Promise((resolve) => { + session.once("Tracing.tracingComplete", () => resolve()); + }); + await session.send("Tracing.end"); + await tracingComplete; + const traceRelPath = + typeof result?.tracePath === "string" + ? result.tracePath + : `status/traces/${createRunArtifactFileStem(result)}.trace.zip`; + const cdpPath = path + .join(process.cwd(), traceRelPath) + .replace(/\.trace\.zip$/, ".cdp.json"); + await mkdir(path.dirname(cdpPath), { recursive: true }); + await writeFile( + cdpPath, + JSON.stringify({ traceEvents: cdpEvents }, null, 0) + "\n", + ); + } catch (err) { + console.warn( + `[bench.spec] CDP tracing stop/write failed (best-effort, ignoring):`, + err, + ); + } + } + const interactionScript = scriptName === "sort" || scriptName === "filter-metadata" || From 7cb8acb261fe546e809fa6cb75f60d924af9235e Mon Sep 17 00:00:00 2001 From: Brian Love Date: Wed, 13 May 2026 11:07:12 -0700 Subject: [PATCH 4/5] =?UTF-8?q?docs(research):=20repo-memory=20entry=20?= =?UTF-8?q?=E2=80=94=20bench-harness=20CDP=20tracing?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.7 --- docs/research/repo-memory.md | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/docs/research/repo-memory.md b/docs/research/repo-memory.md index 25daa83..cf4aedf 100644 --- a/docs/research/repo-memory.md +++ b/docs/research/repo-memory.md @@ -533,8 +533,35 @@ Header docblock updated to cite the n=20 milestone source (`status/milestones/20 No test changes required: the TanStack regex (`/headless.*slower interaction/i`) still matches the shortened label. +### Bench-harness CDP tracing opt-in + +Wired `apps/bench/tests/bench.spec.ts` for opt-in Chrome DevTools-format trace capture, closing the tooling gap surfaced by PR #142's wrapped-text filter perf-diag (Playwright's default `tracing.start({ screenshots, snapshots })` produces an SDK action trace, not a JS flame graph). + +**How to use:** + +``` +PLAYWRIGHT_PERF_TRACE=1 \ + PRETABLE_BENCH_ADAPTER=pretable \ + PRETABLE_BENCH_SCENARIO=S2 \ + PRETABLE_BENCH_SCALE=hypothesis \ + PRETABLE_BENCH_SCRIPT=filter-text \ + pnpm --filter @pretable/app-bench exec playwright test --workers=1 +``` + +Output: `status/traces/.cdp.json` (sibling to the Playwright `.trace.zip`). Gitignored. + +**Loading:** Chrome → DevTools → Performance tab → "Load profile…" (folder icon) → select the `.cdp.json`. Flame graph renders. + +**Categories captured:** `disabled-by-default-devtools.timeline`, `disabled-by-default-devtools.timeline.frame`, `v8`, `disabled-by-default-v8.cpu_profiler` (sampling-frequency=10000). Standard DevTools profiling set. + +**Failure handling:** Best-effort additive. CDP attach failures, end timeouts, and JSON write failures all log + continue — the bench run never fails because of CDP tracing. + +**Known limitation (consumer side, not the harness):** The bench app starts its interaction script on page-load. CDP attach happens after the adapter-label visibility check, so the very first moments of the interaction may run before tracing starts. For hypothesis-scale runs the captured window can be sparse — observed ~145 events / 30 KB for filter-text/hypothesis. For meaningful flame-graph data, prefer wider windows (larger scale, or a future triggered-start variant in the bench app). Out of scope for this harness PR; the wiring + format are correct. + +**Out of scope here:** matrix-runner integration (would multiply wall-clock 5–10×), Speedscope export, automated test for the CDP path. Documented in the spec. + ### Open follow-ups -- **Pretable wrapped-text filter perf-fix investigation** — next item; profiling + scope. +- **Pretable wrapped-text filter perf-fix investigation** — next item; profiling + scope. Tooling now unblocked; remaining blocker is the bench-app interaction-start timing noted above. - **`/bench` page swap to read from `hypotheses.json` directly** — still deferred; aggregator scripts continue feeding the page for now. - **Matrix-runner reliability** — flakes are now well-documented across PRs #133, #134, #140, and this PR's sort re-run (which succeeded for pretable-only, but the multi-adapter runner remains fragile). From 36cc0b21a328b7ac643422e0006b58785db85c1d Mon Sep 17 00:00:00 2001 From: Brian Love Date: Wed, 13 May 2026 11:13:16 -0700 Subject: [PATCH 5/5] chore: prettier format spec + plan Co-Authored-By: Claude Opus 4.7 --- .../2026-05-13-bench-harness-cdp-tracing.md | 16 ++++++++++++++++ ...026-05-13-bench-harness-cdp-tracing-design.md | 1 + 2 files changed, 17 insertions(+) diff --git a/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md b/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md index d29aa0f..00e1102 100644 --- a/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md +++ b/docs/superpowers/plans/2026-05-13-bench-harness-cdp-tracing.md @@ -26,9 +26,11 @@ No `packages/` changes. No app source. No matrix-runner changes. ## Pre-flight - [ ] **0.1** Read the existing `apps/bench/tests/bench.spec.ts` to understand the current tracing wiring: + ``` cat apps/bench/tests/bench.spec.ts ``` + Confirm the spec already does `await page.context().tracing.start({ screenshots: true, snapshots: true })` early and `await page.context().tracing.stop({ path: tracePath })` later. The CDP-tracing path is additive — both traces run when opt-in is set. - [ ] **0.2** Free port 4173 if stale (`lsof -ti tcp:4173 | xargs -r kill -9`). @@ -108,15 +110,19 @@ No `packages/` changes. No app source. No matrix-runner changes. - [ ] **1.5** Verify Imports. The spec already imports `mkdir`, `readFile`, `stat`, `writeFile` from `"node:fs/promises"` and `path` from `"node:path"`. No new imports needed. - [ ] **1.6** Typecheck: + ``` pnpm --filter @pretable/app-bench typecheck ``` + Expected: passes. - [ ] **1.7** Run the existing test with `PLAYWRIGHT_PERF_TRACE` UNSET to confirm regression-free: + ``` pnpm bench:e2e --project=chromium ``` + Expected: spec passes; no `.cdp.json` produced; behavior identical to current `main`. - [ ] **1.8** Commit: @@ -130,6 +136,7 @@ No `packages/` changes. No app source. No matrix-runner changes. ## Task 2 — Manual verification - [ ] **2.1** Run with the env opt-in set: + ``` PLAYWRIGHT_PERF_TRACE=1 \ PRETABLE_BENCH_ADAPTER=pretable \ @@ -140,21 +147,27 @@ No `packages/` changes. No app source. No matrix-runner changes. ``` - [ ] **2.2** Locate the produced JSON: + ``` ls -lt status/traces/*.cdp.json | head -1 ``` + Expected: non-zero size (1–30 MB). - [ ] **2.3** Sanity-check the JSON: + ``` jq '.traceEvents | length' status/traces/*.cdp.json | head -1 ``` + Expected: a number > 100 (a real trace has thousands of events; a low number would suggest the categories aren't producing what we expect). Also check shape: + ``` jq '.traceEvents[0:3]' status/traces/*.cdp.json ``` + Expected: each event has `ph` (phase), `ts` (timestamp), `cat` (category), etc. — Chrome DevTools trace-event format. - [ ] **2.4** Open in Chrome DevTools: @@ -194,12 +207,15 @@ No `packages/` changes. No app source. No matrix-runner changes. ## Task 4 — Gates + PR - [ ] **4.1** Repo-wide gates: + ``` pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format ``` + Expected: all pass. - [ ] **4.2** Push + open PR: + ``` git push -u origin bench-harness-cdp-tracing gh pr create --title "feat(bench): opt-in CDP tracing in bench.spec.ts (unblocks flame-graph perf diagnostics)" --body "..." diff --git a/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md b/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md index 5577560..24ab39e 100644 --- a/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md +++ b/docs/superpowers/specs/2026-05-13-bench-harness-cdp-tracing-design.md @@ -70,6 +70,7 @@ The CDP write happens INSIDE the test fixture, so it lands in the same per-run o The CDP `Tracing` API produces an array of event objects matching the Chrome DevTools "Trace Event Format" — the same format that DevTools' "Save profile…" / "Load profile…" round-trips. Wrap in `{"traceEvents": [...]}` so Chrome DevTools accepts the file directly. Example consumption (manual): + 1. Open Chrome → DevTools → Performance tab. 2. Click "Load profile…" (folder icon). 3. Select `status/traces/.cdp.json`.