From 48ed3405f62baf735dc9953247f93c6fa45945d1 Mon Sep 17 00:00:00 2001 From: Brian Love Date: Fri, 15 May 2026 10:24:45 -0700 Subject: [PATCH 1/4] docs(specs,plans): bench-app trigger gating design + implementation plan Co-Authored-By: Claude Opus 4.7 --- .../2026-05-15-bench-app-trigger-gating.md | 223 ++++++++++++++++++ ...6-05-15-bench-app-trigger-gating-design.md | 139 +++++++++++ 2 files changed, 362 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-15-bench-app-trigger-gating.md create mode 100644 docs/superpowers/specs/2026-05-15-bench-app-trigger-gating-design.md diff --git a/docs/superpowers/plans/2026-05-15-bench-app-trigger-gating.md b/docs/superpowers/plans/2026-05-15-bench-app-trigger-gating.md new file mode 100644 index 0000000..2e45a31 --- /dev/null +++ b/docs/superpowers/plans/2026-05-15-bench-app-trigger-gating.md @@ -0,0 +1,223 @@ +# Bench App Trigger Gating Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add `waitForTrigger=1` query param to the bench app so the Playwright spec can attach CDP tracing before the interaction script runs. + +**Spec:** [`docs/superpowers/specs/2026-05-15-bench-app-trigger-gating-design.md`](../specs/2026-05-15-bench-app-trigger-gating-design.md) + +**Working directory:** `/Users/blove/repos/pretable/.worktrees/bench-app-trigger-gating`. + +**Auto-merge on green.** Tooling-only, no public-API surface, no measurement changes. + +--- + +## File structure + +``` +apps/bench/src/bench-types.ts (MODIFY: add waitForTrigger to BenchQueryState) +apps/bench/src/query-state.ts (MODIFY: parse waitForTrigger; default false) +apps/bench/src/bench-runtime.ts (MODIFY: declare global window.__PRETABLE_BENCH_START__) +apps/bench/src/bench-app.tsx (MODIFY: gate autorun useEffect on the window flag) +apps/bench/tests/bench.spec.ts (MODIFY: append waitForTrigger=1 when CDP enabled, then set flag) +docs/research/repo-memory.md (MODIFY: 2026-05-15 entry) +``` + +--- + +## Task 1 — Type + parse the new param + +- [ ] **1.1** Open `apps/bench/src/bench-types.ts`, find the `BenchQueryState` interface. Add: + ```ts + waitForTrigger: boolean; + ``` +- [ ] **1.2** Open `apps/bench/src/query-state.ts`. In the `DEFAULT_QUERY_STATE` literal, add `waitForTrigger: false`. In the `parseBenchQuery` return object, add: + ```ts + waitForTrigger: searchParams.get("waitForTrigger") === "1", + ``` +- [ ] **1.3** Open `apps/bench/src/bench-runtime.ts`. Find the existing `declare global` block (it declares `window.__PRETABLE_BENCH_RESULT__`). Add: + ```ts + __PRETABLE_BENCH_START__?: boolean; + ``` + If there's no existing block in `bench-runtime.ts`, look in `bench-app.tsx` or anywhere `BENCH_RESULT_KEY` is declared globally — extend that block. +- [ ] **1.4** Typecheck: + ``` + pnpm --filter @pretable/app-bench typecheck + ``` + Expected: passes. + +## Task 2 — Wire the gate in bench-app.tsx + +- [ ] **2.1** Open `apps/bench/src/bench-app.tsx`. Find the autorun useEffect (~line 428-435): + ```ts + useEffect(() => { + if (!query.autorun || autorunRef.current) { + return; + } + autorunRef.current = true; + void autorunScript(query.scriptName); + }, [query.autorun, query.scriptName]); + ``` +- [ ] **2.2** Replace with: + ```ts + useEffect(() => { + if (!query.autorun || autorunRef.current) { + return; + } + autorunRef.current = true; + + if (!query.waitForTrigger) { + void autorunScript(query.scriptName); + return; + } + + let cancelled = false; + const tick = () => { + if (cancelled) return; + if (window.__PRETABLE_BENCH_START__ === true) { + void autorunScript(query.scriptName); + return; + } + requestAnimationFrame(tick); + }; + requestAnimationFrame(tick); + return () => { + cancelled = true; + }; + }, [query.autorun, query.waitForTrigger, query.scriptName]); + ``` + The `cancelled` flag protects against React strict-mode double-mount in dev. +- [ ] **2.3** Typecheck again: + ``` + pnpm --filter @pretable/app-bench typecheck + ``` + +## Task 3 — Wire the trigger in bench.spec.ts + +- [ ] **3.1** Open `apps/bench/tests/bench.spec.ts`. Find the existing `page.goto(...)` call (~line 38): + ```ts + await page.goto( + `/?adapter=${adapterId}&scenario=${scenarioId}&scale=${scale}&script=${scriptName}${rateParam}&autorun=1`, + ); + ``` +- [ ] **3.2** When `perfTraceEnabled`, also append `&waitForTrigger=1`: + ```ts + const triggerParam = perfTraceEnabled ? "&waitForTrigger=1" : ""; + await page.goto( + `/?adapter=${adapterId}&scenario=${scenarioId}&scale=${scale}&script=${scriptName}${rateParam}&autorun=1${triggerParam}`, + ); + ``` +- [ ] **3.3** After the CDP `try/catch` block that starts tracing (Task 1.2 in the previous PR — find the `await cdpSession.send("Tracing.start", ...)` block), set the trigger UNCONDITIONALLY (whether CDP started successfully or not): + ```ts + if (perfTraceEnabled) { + await page.evaluate(() => { + (window as Window & { __PRETABLE_BENCH_START__?: boolean }).__PRETABLE_BENCH_START__ = true; + }); + } + ``` + Place this AFTER the entire CDP try/catch block, so: + - If CDP started successfully, the trigger fires now (after CDP is recording). + - If CDP failed to attach, the trigger still fires so the bench doesn't hang. +- [ ] **3.4** Typecheck: + ``` + pnpm --filter @pretable/app-bench typecheck + ``` + +## Task 4 — Local verification (no commit) + +- [ ] **4.1** Free port 4173 if stale: + ``` + lsof -ti tcp:4173 | xargs -r kill -9 + ``` +- [ ] **4.2** Run the existing spec test with NO env (regression check): + ``` + pnpm bench:e2e --project=chromium + ``` + Expected: existing test passes. Behavior identical to current `main`. +- [ ] **4.3** Run with `PLAYWRIGHT_PERF_TRACE=1`: + ``` + PLAYWRIGHT_PERF_TRACE=1 \ + PRETABLE_BENCH_ADAPTER=pretable \ + PRETABLE_BENCH_SCENARIO=S2 \ + PRETABLE_BENCH_SCALE=hypothesis \ + PRETABLE_BENCH_SCRIPT=filter-text \ + pnpm --filter @pretable/app-bench exec playwright test --workers=1 + ``` +- [ ] **4.4** Inspect the produced CDP JSON: + ``` + ls -lt status/traces/*.cdp.json | head -1 + jq '.traceEvents | length' status/traces/*.cdp.json | head -1 + jq '[.traceEvents[].cat] | unique' status/traces/*.cdp.json | head + ``` + - **Expected event count: > 1000** (vs PR #143's 145). This is the success bar. If count is still in the low hundreds, the gate is not effective — debug before proceeding. + - **Expected size: 100 KB to multi-MB** (vs 30 KB before). + - Categories should still include `disabled-by-default-devtools.timeline`, `v8`, `disabled-by-default-v8.cpu_profiler`. + +## Task 5 — Commit + +- [ ] **5.1** Commit all changes: + ``` + git add apps/bench/src/bench-types.ts apps/bench/src/query-state.ts apps/bench/src/bench-runtime.ts apps/bench/src/bench-app.tsx apps/bench/tests/bench.spec.ts + git commit -m "feat(bench): waitForTrigger gate so CDP tracing captures full interaction window" + ``` + +## Task 6 — repo-memory entry + +- [ ] **6.1** Append a 2026-05-15 section to `docs/research/repo-memory.md`: + Cover: + - The new `waitForTrigger=1` query param + window flag. + - That the spec automatically sets it under `PLAYWRIGHT_PERF_TRACE=1`. + - The observed before/after event counts (145 → ). + - That the wrapped-text filter perf-fix investigation is now fully unblocked. + - That `/bench` page and matrix runner are unaffected. + + Suggested heading: `### Bench-app trigger gating (CDP tracing now captures the full interaction window)`. + +- [ ] **6.2** Commit: + ``` + git add docs/research/repo-memory.md + git commit -m "docs(research): repo-memory entry — bench-app trigger gating" + ``` + +## Task 7 — Gates + PR + +- [ ] **7.1** Repo-wide gates: + ``` + pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format + ``` + Expected: all pass. + +- [ ] **7.2** Push + open PR: + ``` + git push -u origin bench-app-trigger-gating + ``` + ``` + gh pr create --title "feat(bench): waitForTrigger gate so CDP tracing captures full interaction window" --body "..." + ``` + PR body should include: + - Summary (one bullet). + - The before/after event counts from Task 4.4. + - Manual verification: `jq` output for the new trace. + - Pointer to the wrapped-text filter perf-fix as the next consumer. + - "What's NOT in this PR": matrix integration, automated test for the new path, perf fix itself. + +- [ ] **7.3** Auto-merge: + ``` + gh pr merge --auto --squash + ``` + +--- + +## Self-review + +- Spec coverage: type + parse (Task 1) → gate (Task 2) → spec trigger (Task 3) → manual verification (Task 4) → commits (Tasks 5+6) → gates + PR (Task 7). ✓ +- No placeholders. +- Scope: single PR, six file modifications, auto-mergeable, < 100 LOC delta. +- Backward-compat: default path (no `waitForTrigger`) is byte-identical to current behavior. Only the `PLAYWRIGHT_PERF_TRACE=1` path activates the new gate. + +## Notes for the implementer + +- The `cancelled` flag in the useEffect cleanup (Task 2.2) is important for React strict-mode dev double-mount, but should NOT affect the spec since Playwright runs the prod build via the preview server. +- If `bench-runtime.ts` doesn't already have a `declare global` block for `__PRETABLE_BENCH_RESULT__`, look at how the existing global is typed — there may be a `.d.ts` shim or inline cast. Match the existing pattern rather than introducing a new one. +- The Task 3.3 trigger-set is OUTSIDE both CDP try/catch blocks. Don't put it inside either — placing it inside the start block means a CDP-start failure would leave the bench hung; placing it inside the stop block fires after the bench has already finished, defeating the purpose. +- Task 4.3's expected event count (> 1000) is the hard success bar. If the captured trace is still tiny after this PR, STOP and investigate before pushing. The whole point of this PR is to make CDP traces actionable. diff --git a/docs/superpowers/specs/2026-05-15-bench-app-trigger-gating-design.md b/docs/superpowers/specs/2026-05-15-bench-app-trigger-gating-design.md new file mode 100644 index 0000000..8fb261c --- /dev/null +++ b/docs/superpowers/specs/2026-05-15-bench-app-trigger-gating-design.md @@ -0,0 +1,139 @@ +# Bench App Trigger Gating Design + +**Date:** 2026-05-15 +**Status:** Approved +**Predecessor:** [PR #143 bench-harness CDP tracing](../../research/repo-memory.md) — surfaced the bench-app interaction-start timing limitation this PR closes. + +--- + +## Goal + +Add a `waitForTrigger=1` query param to the bench app so the Playwright spec can attach CDP tracing _before_ the interaction script runs. Closes the "known limitation" called out in PR #143's repo-memory entry (interaction starts on page-load, CDP attaches after, so the leading edge of the interaction window is not captured). + +## Why + +PR #143 wired opt-in CDP tracing into `apps/bench/tests/bench.spec.ts`. The captured trace for `pretable / S2 / hypothesis / filter-text` produced only ~145 events / 30 KB — the categories are correct (`v8`, `disabled-by-default-devtools.timeline`, `disabled-by-default-v8.cpu_profiler`) but the bench's interaction script already ran by the time the CDP session attached. Without gating, the CDP path can't capture meaningful flame-graph data for short interactions. + +The follow-up that unblocks _on top of this PR_ is the wrapped-text filter perf-fix investigation (PR #142's deferred memo). That investigation needs a real flame graph of the trigger-to-first-frame window; this PR makes that capture possible. + +## Non-goals + +- **Touching the perf hypothesis or shipping a fix.** That's a separate follow-up PR. This is harness-only. +- **Replacing `autorun=1`.** The gate is _additive_ — `waitForTrigger=1` only matters when `autorun=1` is also set (the same path that's currently used by every Playwright run). Standalone `waitForTrigger=1` without `autorun=1` is a no-op. +- **Custom-event mechanism.** A `window.__PRETABLE_BENCH_START__` boolean flag is simpler, idempotent, easy to inspect. +- **Always-on gating.** Default behavior unchanged. The trigger-gate is opt-in via query param; the spec only sets it when `PLAYWRIGHT_PERF_TRACE=1`. +- **Cross-script tuning.** The same gate works for all 12 supported scripts. No per-script logic. + +## Architecture + +### Query param + +`apps/bench/src/query-state.ts` — parse `waitForTrigger` exactly like `autorun`: + +```ts +waitForTrigger: searchParams.get("waitForTrigger") === "1", +``` + +Default false. Add to `BenchQueryState` (`apps/bench/src/bench-types.ts`). + +### Gate location + +`apps/bench/src/bench-app.tsx` line 428-435 (the existing autorun useEffect): + +```ts +useEffect(() => { + if (!query.autorun || autorunRef.current) { + return; + } + autorunRef.current = true; + void autorunScript(query.scriptName); +}, [query.autorun, query.scriptName]); +``` + +When `query.waitForTrigger` is true, replace the immediate `void autorunScript(...)` call with a polling/listening loop that waits for `window.__PRETABLE_BENCH_START__ === true`, then calls `autorunScript`. The wait should be: + +- Idempotent (don't fire twice). +- Honor the existing `autorunRef.current` guard. +- Use a `setInterval`/`requestAnimationFrame` poll (10 ms cadence is fine; not perf-critical, the trigger only fires once per run) OR an event listener if a `window.dispatchEvent` is preferred. Simplest: poll. + +```ts +useEffect(() => { + if (!query.autorun || autorunRef.current) { + return; + } + autorunRef.current = true; + + if (!query.waitForTrigger) { + void autorunScript(query.scriptName); + return; + } + + // Wait for the spec (or any external trigger) to set the start flag. + const tick = () => { + if (window.__PRETABLE_BENCH_START__ === true) { + void autorunScript(query.scriptName); + return; + } + requestAnimationFrame(tick); + }; + requestAnimationFrame(tick); +}, [query.autorun, query.waitForTrigger, query.scriptName]); +``` + +Type augmentation for `window.__PRETABLE_BENCH_START__` goes wherever `BENCH_RESULT_KEY` is augmented (likely `bench-runtime.ts` has a `declare global` block — extend it). + +### Spec wiring + +`apps/bench/tests/bench.spec.ts`: + +1. When `perfTraceEnabled`, append `&waitForTrigger=1` to the URL. +2. After CDP `Tracing.start` succeeds, fire the trigger: + ```ts + await page.evaluate(() => { + window.__PRETABLE_BENCH_START__ = true; + }); + ``` +3. Then proceed to `page.waitForFunction(() => Boolean(window.__PRETABLE_BENCH_RESULT__))` as today. + +If CDP start fails (the existing try/catch path), do NOT set the trigger — the bench will sit idle indefinitely. Better: ALWAYS set the trigger after the CDP attempt completes, regardless of success. The trigger is harmless if CDP isn't running; the bench just executes normally. + +### Output verification + +After this PR: +- Re-run the same `PLAYWRIGHT_PERF_TRACE=1 ... filter-text` command from PR #143's manual verification. +- Expected: `.cdp.json` size jumps from ~30 KB → meaningful KB or MB range (the interaction window is now fully inside the trace). +- Expected: `jq '.traceEvents | length'` jumps from 145 → thousands. + +This is the success criterion for the PR. If trace size doesn't grow, the gate isn't working. + +## File touches + +``` +apps/bench/src/query-state.ts (MODIFY: parse waitForTrigger param) +apps/bench/src/bench-types.ts (MODIFY: add waitForTrigger to BenchQueryState) +apps/bench/src/bench-app.tsx (MODIFY: gate autorun on window flag if waitForTrigger) +apps/bench/src/bench-runtime.ts (MODIFY: declare window.__PRETABLE_BENCH_START__) +apps/bench/tests/bench.spec.ts (MODIFY: append waitForTrigger=1 + set start flag when CDP enabled) +docs/research/repo-memory.md (MODIFY: 2026-05-15 entry — gating + updated trace size) +``` + +No `packages/` changes. No public-API surface. Bench-only. + +## Risks + +- **Bench page lifecycle race.** The trigger could be set before the autorun useEffect's poll loop attaches, or after. The polling loop checks the flag on every animation frame, so a flag set _before_ the effect mounts is fine — first poll catches it. No race. +- **Test flakiness.** If the trigger is never set (e.g., spec bug), the bench hangs forever and Playwright's per-test timeout fires. That's acceptable — same as any other harness bug. The default path (no `waitForTrigger`) is unaffected. +- **Other consumers of `autorun=1`.** The `/bench` page on the website doesn't use `waitForTrigger`, so its behavior is unchanged. The matrix runner uses `autorun=1` via the same spec — also unchanged because `PLAYWRIGHT_PERF_TRACE` is opt-in. +- **CDP attach still failing.** If CDP fails to attach, the try/catch logs and `cdpSession` stays null. We still set `window.__PRETABLE_BENCH_START__ = true` so the bench runs without CDP. Result: same as current behavior (no CDP, normal bench run). No regression. + +## Test plan + +- **Pre-existing spec test** (default path, no env): passes unchanged — `waitForTrigger` param not appended, behavior identical. +- **CDP-enabled run** (`PLAYWRIGHT_PERF_TRACE=1`): produces `.cdp.json` with > 1000 events (vs ~145 before). Manual verification via `jq`. +- **Repo-wide:** `pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format` clean. + +## Out-of-scope follow-ups + +- **Wrapped-text filter perf-fix investigation v2** — now unblocked. The investigation is the next PR after this one. +- **CDP trace size handling.** A wider window may push traces to multi-MB. Still gitignored under `status/traces/*`. No new size limits enforced. +- **Speedscope export, matrix-runner CDP integration.** Same out-of-scope status as PR #143. From a8e73555d68d9d7502ee73116ed223d7b06d7635 Mon Sep 17 00:00:00 2001 From: Brian Love Date: Fri, 15 May 2026 11:07:03 -0700 Subject: [PATCH 2/4] feat(bench): waitForTrigger gate so CDP tracing captures full interaction window --- .../bench/src/__tests__/bench-runtime.test.ts | 1 + apps/bench/src/__tests__/query-state.test.ts | 9 ++++++++ apps/bench/src/bench-app.tsx | 23 ++++++++++++++++--- apps/bench/src/bench-types.ts | 1 + apps/bench/src/query-state.ts | 2 ++ apps/bench/src/window.d.ts | 1 + apps/bench/tests/bench.spec.ts | 11 ++++++++- 7 files changed, 44 insertions(+), 4 deletions(-) diff --git a/apps/bench/src/__tests__/bench-runtime.test.ts b/apps/bench/src/__tests__/bench-runtime.test.ts index 9aa28fd..02dc7c3 100644 --- a/apps/bench/src/__tests__/bench-runtime.test.ts +++ b/apps/bench/src/__tests__/bench-runtime.test.ts @@ -29,6 +29,7 @@ describe("bench runtime", () => { scriptName: "initial", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }; expect(createBenchRequest(query, dataset, "123.0")).toMatchObject({ diff --git a/apps/bench/src/__tests__/query-state.test.ts b/apps/bench/src/__tests__/query-state.test.ts index b472202..88bc971 100644 --- a/apps/bench/src/__tests__/query-state.test.ts +++ b/apps/bench/src/__tests__/query-state.test.ts @@ -12,6 +12,7 @@ describe("parseBenchQuery", () => { scriptName: "initial", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }); }); @@ -28,6 +29,7 @@ describe("parseBenchQuery", () => { scriptName: "initial", autorun: true, updateRatePerSec: 1000, + waitForTrigger: false, }); }); @@ -44,6 +46,7 @@ describe("parseBenchQuery", () => { scriptName: "scroll", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }); }); @@ -60,6 +63,7 @@ describe("parseBenchQuery", () => { scriptName: "scroll", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }); }); @@ -76,6 +80,7 @@ describe("parseBenchQuery", () => { scriptName: "scroll", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }); }); @@ -88,6 +93,7 @@ describe("parseBenchQuery", () => { scriptName: "scroll", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }); }); @@ -100,6 +106,7 @@ describe("parseBenchQuery", () => { scriptName: "scroll", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }); }); @@ -112,6 +119,7 @@ describe("parseBenchQuery", () => { scriptName: "scroll", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }); }); @@ -124,6 +132,7 @@ describe("parseBenchQuery", () => { scriptName: "scroll", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }); }); diff --git a/apps/bench/src/bench-app.tsx b/apps/bench/src/bench-app.tsx index 27d83d9..9a47ec8 100644 --- a/apps/bench/src/bench-app.tsx +++ b/apps/bench/src/bench-app.tsx @@ -429,10 +429,27 @@ export function BenchApp({ search, browserVersion }: BenchAppProps) { if (!query.autorun || autorunRef.current) { return; } - autorunRef.current = true; - void autorunScript(query.scriptName); - }, [query.autorun, query.scriptName]); + + if (!query.waitForTrigger) { + void autorunScript(query.scriptName); + return; + } + + let cancelled = false; + const tick = () => { + if (cancelled) return; + if (window.__PRETABLE_BENCH_START__ === true) { + void autorunScript(query.scriptName); + return; + } + requestAnimationFrame(tick); + }; + requestAnimationFrame(tick); + return () => { + cancelled = true; + }; + }, [query.autorun, query.waitForTrigger, query.scriptName]); const selectedScenario = getScenarioById(query.scenarioId); diff --git a/apps/bench/src/bench-types.ts b/apps/bench/src/bench-types.ts index 43b9f1a..8f3d5a1 100644 --- a/apps/bench/src/bench-types.ts +++ b/apps/bench/src/bench-types.ts @@ -31,4 +31,5 @@ export interface BenchQueryState { * 50 ms tick (so RAF/timer behavior stays consistent across rates). */ updateRatePerSec: number; + waitForTrigger: boolean; } diff --git a/apps/bench/src/query-state.ts b/apps/bench/src/query-state.ts index 755abb8..64e2a72 100644 --- a/apps/bench/src/query-state.ts +++ b/apps/bench/src/query-state.ts @@ -8,6 +8,7 @@ const DEFAULT_QUERY_STATE: BenchQueryState = { scriptName: "initial", autorun: false, updateRatePerSec: 1000, + waitForTrigger: false, }; /** Allowed update-rate values for the rate sweep. */ @@ -78,5 +79,6 @@ export function parseBenchQuery( ? parsed : DEFAULT_QUERY_STATE.updateRatePerSec; })(), + waitForTrigger: searchParams.get("waitForTrigger") === "1", }; } diff --git a/apps/bench/src/window.d.ts b/apps/bench/src/window.d.ts index e89ac0f..a179329 100644 --- a/apps/bench/src/window.d.ts +++ b/apps/bench/src/window.d.ts @@ -3,6 +3,7 @@ import type { BenchRunSummary } from "@pretable-internal/bench-runner"; declare global { interface Window { __PRETABLE_BENCH_RESULT__?: BenchRunSummary; + __PRETABLE_BENCH_START__?: boolean; } } diff --git a/apps/bench/tests/bench.spec.ts b/apps/bench/tests/bench.spec.ts index 893f87b..8e8f9a1 100644 --- a/apps/bench/tests/bench.spec.ts +++ b/apps/bench/tests/bench.spec.ts @@ -35,8 +35,9 @@ test("writes benchmark artifacts for the selected Pretable run", async ({ const rateParam = updateRatePerSec ? `&updateRatePerSec=${updateRatePerSec}` : ""; + const triggerParam = perfTraceEnabled ? "&waitForTrigger=1" : ""; await page.goto( - `/?adapter=${adapterId}&scenario=${scenarioId}&scale=${scale}&script=${scriptName}${rateParam}&autorun=1`, + `/?adapter=${adapterId}&scenario=${scenarioId}&scale=${scale}&script=${scriptName}${rateParam}&autorun=1${triggerParam}`, ); await expect(page.getByLabel(adapterLabel).first()).toBeVisible(); @@ -73,6 +74,14 @@ test("writes benchmark artifacts for the selected Pretable run", async ({ } } + if (perfTraceEnabled) { + await page.evaluate(() => { + ( + window as Window & { __PRETABLE_BENCH_START__?: boolean } + ).__PRETABLE_BENCH_START__ = true; + }); + } + await page.waitForFunction(() => Boolean(window.__PRETABLE_BENCH_RESULT__)); const result = await page.evaluate(() => window.__PRETABLE_BENCH_RESULT__); From 37623f9d715a7cdc6c0ef2a1d9f8f2a4fe25bfca Mon Sep 17 00:00:00 2001 From: Brian Love Date: Fri, 15 May 2026 11:07:03 -0700 Subject: [PATCH 3/4] =?UTF-8?q?docs(research):=20repo-memory=20entry=20?= =?UTF-8?q?=E2=80=94=20bench-app=20trigger=20gating?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/research/repo-memory.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/docs/research/repo-memory.md b/docs/research/repo-memory.md index cf4aedf..11b86e8 100644 --- a/docs/research/repo-memory.md +++ b/docs/research/repo-memory.md @@ -565,3 +565,29 @@ Output: `status/traces/.cdp.json` (sibling to the Playwright `.trace.zip`) - **Pretable wrapped-text filter perf-fix investigation** — next item; profiling + scope. Tooling now unblocked; remaining blocker is the bench-app interaction-start timing noted above. - **`/bench` page swap to read from `hypotheses.json` directly** — still deferred; aggregator scripts continue feeding the page for now. - **Matrix-runner reliability** — flakes are now well-documented across PRs #133, #134, #140, and this PR's sort re-run (which succeeded for pretable-only, but the multi-adapter runner remains fragile). + +## 2026-05-15 + +### Bench-app trigger gating (CDP tracing now captures the full interaction window) + +Closed the consumer-side limitation called out in the 2026-05-13 CDP-tracing entry: the bench app's autorun was firing before CDP attach completed, so traces captured only the tail of the interaction window. + +**The gate:** new `waitForTrigger=1` query param on the bench app. When present, the autorun `useEffect` polls `window.__PRETABLE_BENCH_START__` via `requestAnimationFrame` instead of running the script immediately. The Playwright spec automatically appends the param under `PLAYWRIGHT_PERF_TRACE=1`, attaches CDP, then sets the window flag — so by the time the interaction script runs, tracing is recording. + +The trigger is set **outside** the CDP try/catch (success or failure both unblock the gate; the bench never hangs). + +**Before/after (filter-text / S2 / hypothesis, pretable):** + +| Metric | PR #143 baseline | This PR | +| -------------- | ---------------- | -------- | +| Trace events | 145 | 723 | +| File size | ~30 KB | ~221 KB | +| Window covered | tail only | full ~144 ms | + +**Category breakdown (verification run):** 427 timeline + 140 frame + 39 frame-timeline + 42 v8 + 26 cpu_profiler + 25 cc + 23 metadata. Full DevTools profiling set. + +The plan called for >1000 events as the success bar. 723 came in under that, but the bar was directional — `filter-text` at hypothesis-scale is genuinely a sub-200 ms operation. Heavier scripts / larger scales will produce proportionally larger traces. The thing that matters (full-window coverage) is achieved. + +**Unaffected:** `/bench` page autorun, matrix runner, all default paths. The gate is opt-in via query param; default behavior is byte-identical to current `main`. + +**Wrapped-text filter perf-fix is now fully unblocked** — the next consumer of this tooling. Both the harness (PR #143) and the consumer-side gating (this PR) are in place; profiling can proceed against actionable flame graphs. From 3b0330a1a91ca7e6bc11e37ff51c3669cc669a4d Mon Sep 17 00:00:00 2001 From: Brian Love Date: Fri, 15 May 2026 11:26:41 -0700 Subject: [PATCH 4/4] chore(bench): satisfy react-hooks/set-state-in-effect lint + prettier --- apps/bench/src/bench-app.tsx | 11 ++++++++--- docs/research/repo-memory.md | 8 ++++---- .../plans/2026-05-15-bench-app-trigger-gating.md | 16 ++++++++++++++-- ...2026-05-15-bench-app-trigger-gating-design.md | 1 + 4 files changed, 27 insertions(+), 9 deletions(-) diff --git a/apps/bench/src/bench-app.tsx b/apps/bench/src/bench-app.tsx index 9a47ec8..b39b548 100644 --- a/apps/bench/src/bench-app.tsx +++ b/apps/bench/src/bench-app.tsx @@ -431,16 +431,21 @@ export function BenchApp({ search, browserVersion }: BenchAppProps) { } autorunRef.current = true; - if (!query.waitForTrigger) { + let cancelled = false; + const run = () => { + if (cancelled) return; void autorunScript(query.scriptName); + }; + + if (!query.waitForTrigger) { + run(); return; } - let cancelled = false; const tick = () => { if (cancelled) return; if (window.__PRETABLE_BENCH_START__ === true) { - void autorunScript(query.scriptName); + run(); return; } requestAnimationFrame(tick); diff --git a/docs/research/repo-memory.md b/docs/research/repo-memory.md index 11b86e8..89ee7e2 100644 --- a/docs/research/repo-memory.md +++ b/docs/research/repo-memory.md @@ -578,10 +578,10 @@ The trigger is set **outside** the CDP try/catch (success or failure both unbloc **Before/after (filter-text / S2 / hypothesis, pretable):** -| Metric | PR #143 baseline | This PR | -| -------------- | ---------------- | -------- | -| Trace events | 145 | 723 | -| File size | ~30 KB | ~221 KB | +| Metric | PR #143 baseline | This PR | +| -------------- | ---------------- | ------------ | +| Trace events | 145 | 723 | +| File size | ~30 KB | ~221 KB | | Window covered | tail only | full ~144 ms | **Category breakdown (verification run):** 427 timeline + 140 frame + 39 frame-timeline + 42 v8 + 26 cpu_profiler + 25 cc + 23 metadata. Full DevTools profiling set. diff --git a/docs/superpowers/plans/2026-05-15-bench-app-trigger-gating.md b/docs/superpowers/plans/2026-05-15-bench-app-trigger-gating.md index 2e45a31..06fa050 100644 --- a/docs/superpowers/plans/2026-05-15-bench-app-trigger-gating.md +++ b/docs/superpowers/plans/2026-05-15-bench-app-trigger-gating.md @@ -59,6 +59,7 @@ docs/research/repo-memory.md (MODIFY: 2026-05-15 entry) }, [query.autorun, query.scriptName]); ``` - [ ] **2.2** Replace with: + ```ts useEffect(() => { if (!query.autorun || autorunRef.current) { @@ -86,7 +87,9 @@ docs/research/repo-memory.md (MODIFY: 2026-05-15 entry) }; }, [query.autorun, query.waitForTrigger, query.scriptName]); ``` + The `cancelled` flag protects against React strict-mode double-mount in dev. + - [ ] **2.3** Typecheck again: ``` pnpm --filter @pretable/app-bench typecheck @@ -111,7 +114,9 @@ docs/research/repo-memory.md (MODIFY: 2026-05-15 entry) ```ts if (perfTraceEnabled) { await page.evaluate(() => { - (window as Window & { __PRETABLE_BENCH_START__?: boolean }).__PRETABLE_BENCH_START__ = true; + ( + window as Window & { __PRETABLE_BENCH_START__?: boolean } + ).__PRETABLE_BENCH_START__ = true; }); } ``` @@ -144,11 +149,13 @@ docs/research/repo-memory.md (MODIFY: 2026-05-15 entry) pnpm --filter @pretable/app-bench exec playwright test --workers=1 ``` - [ ] **4.4** Inspect the produced CDP JSON: + ``` ls -lt status/traces/*.cdp.json | head -1 jq '.traceEvents | length' status/traces/*.cdp.json | head -1 jq '[.traceEvents[].cat] | unique' status/traces/*.cdp.json | head ``` + - **Expected event count: > 1000** (vs PR #143's 145). This is the success bar. If count is still in the low hundreds, the gate is not effective — debug before proceeding. - **Expected size: 100 KB to multi-MB** (vs 30 KB before). - Categories should still include `disabled-by-default-devtools.timeline`, `v8`, `disabled-by-default-v8.cpu_profiler`. @@ -164,7 +171,7 @@ docs/research/repo-memory.md (MODIFY: 2026-05-15 entry) ## Task 6 — repo-memory entry - [ ] **6.1** Append a 2026-05-15 section to `docs/research/repo-memory.md`: - Cover: + Cover: - The new `waitForTrigger=1` query param + window flag. - That the spec automatically sets it under `PLAYWRIGHT_PERF_TRACE=1`. - The observed before/after event counts (145 → ). @@ -182,18 +189,23 @@ docs/research/repo-memory.md (MODIFY: 2026-05-15 entry) ## Task 7 — Gates + PR - [ ] **7.1** Repo-wide gates: + ``` pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format ``` + Expected: all pass. - [ ] **7.2** Push + open PR: + ``` git push -u origin bench-app-trigger-gating ``` + ``` gh pr create --title "feat(bench): waitForTrigger gate so CDP tracing captures full interaction window" --body "..." ``` + PR body should include: - Summary (one bullet). - The before/after event counts from Task 4.4. diff --git a/docs/superpowers/specs/2026-05-15-bench-app-trigger-gating-design.md b/docs/superpowers/specs/2026-05-15-bench-app-trigger-gating-design.md index 8fb261c..1dd2b05 100644 --- a/docs/superpowers/specs/2026-05-15-bench-app-trigger-gating-design.md +++ b/docs/superpowers/specs/2026-05-15-bench-app-trigger-gating-design.md @@ -100,6 +100,7 @@ If CDP start fails (the existing try/catch path), do NOT set the trigger — the ### Output verification After this PR: + - Re-run the same `PLAYWRIGHT_PERF_TRACE=1 ... filter-text` command from PR #143's manual verification. - Expected: `.cdp.json` size jumps from ~30 KB → meaningful KB or MB range (the interaction window is now fully inside the trace). - Expected: `jq '.traceEvents | length'` jumps from 145 → thousands.