Skip to content

feat(examples-chat): cross-stack E2E harness — Phase 2a (infra)#309

Merged
blove merged 9 commits into
mainfrom
claude/aimock-e2e-phase-2a
May 14, 2026
Merged

feat(examples-chat): cross-stack E2E harness — Phase 2a (infra)#309
blove merged 9 commits into
mainfrom
claude/aimock-e2e-phase-2a

Conversation

@blove
Copy link
Copy Markdown
Contributor

@blove blove commented May 14, 2026

Summary

Stands up a cross-stack E2E test harness for examples/chat. Phase 2a is infrastructure only — the harness, one trivial smoke fixture, the per-PR CI job, and a scheduled fixture-drift workflow. Real scenario coverage lands in Phase 2b+ as small additive PRs.

Sits on top of Phase 1 (#305) which covers parser-level invariants at unit granularity. Phase 2a covers integration shapes that Phase 1 cannot reach (LangGraph SSE framing, Python emit_in_place coalescing, single-bubble invariant, surface mounting).

How it works

Playwright (real Chromium)
  → Angular dev server :4200
    → LangGraph dev server :2024  (OPENAI_BASE_URL pointed at the mock)
      → mock LLM server  (replays a committed JSON fixture)

A new Nx project at examples/chat/aimock-e2e/ owns the harness module, the Playwright config + globalSetup, the one seed fixture (hi.json), and two scripts (record + drift).

Local end-to-end verification

  • aimock-runner unit tests: 2/2 passed (Vitest)
  • Smoke spec: 1/1 passed in 16.4s (real Chromium + real Python LangGraph + mock LLM; LangGraph logs confirmed hit at /v1/responses with 200)
  • Phase 1 regression suites still green: chat 704/704, a2ui 54/54

CI integration

  • New per-PR job examples/chat — aimock e2e runs in parallel with the existing python smoke job; added to the deploy job's needs: so a broken run blocks Vercel.
  • New scheduled workflow aimock fixture drift runs daily against real OpenAI, opens an issue on drift, never auto-updates fixtures.

Spec: docs/superpowers/specs/2026-05-13-aimock-e2e-phase-2a-design.md
Plan: docs/superpowers/plans/2026-05-13-aimock-e2e-phase-2a.md

Test plan

  • Smoke spec passes locally end-to-end
  • Harness unit tests pass
  • Phase 1 regression suites still green
  • No production code touched (everything under examples/chat/aimock-e2e/ + workflow files + minimal package.json add)
  • Drift workflow has a workflow_dispatch trigger so it can be manually verified before relying on it
  • CI green on this PR

blove added 9 commits May 13, 2026 16:28
Phase 2a sits between Phase 1 (input-variance tables, #305) and the
scenario-coverage phases that will follow. Lands the harness, one
trivial smoke fixture, the per-PR CI job, and the daily drift-detection
workflow. Real product-level regression coverage is deferred to
Phase 2b+ as small additive PRs.
8-task plan with Task 0 as a de-risk gate that validates the harness's
core assumptions (mock API shape, Python OpenAI SDK base-URL handoff,
LangGraph agent code compatibility) before any code lands.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cacheplane Ready Ready Preview, Comment May 14, 2026 1:30am

Request Review

@blove blove merged commit 00530a1 into main May 14, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant