feat(examples-chat): cross-stack E2E harness — Phase 2a (infra)#309
Merged
Conversation
Phase 2a sits between Phase 1 (input-variance tables, #305) and the scenario-coverage phases that will follow. Lands the harness, one trivial smoke fixture, the per-PR CI job, and the daily drift-detection workflow. Real product-level regression coverage is deferred to Phase 2b+ as small additive PRs.
8-task plan with Task 0 as a de-risk gate that validates the harness's core assumptions (mock API shape, Python OpenAI SDK base-URL handoff, LangGraph agent code compatibility) before any code lands.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stands up a cross-stack E2E test harness for
examples/chat. Phase 2a is infrastructure only — the harness, one trivial smoke fixture, the per-PR CI job, and a scheduled fixture-drift workflow. Real scenario coverage lands in Phase 2b+ as small additive PRs.Sits on top of Phase 1 (#305) which covers parser-level invariants at unit granularity. Phase 2a covers integration shapes that Phase 1 cannot reach (LangGraph SSE framing, Python emit_in_place coalescing, single-bubble invariant, surface mounting).
How it works
A new Nx project at
examples/chat/aimock-e2e/owns the harness module, the Playwright config + globalSetup, the one seed fixture (hi.json), and two scripts (record + drift).Local end-to-end verification
aimock-runnerunit tests: 2/2 passed (Vitest)/v1/responseswith 200)CI integration
examples/chat — aimock e2eruns in parallel with the existing python smoke job; added to the deploy job'sneeds:so a broken run blocks Vercel.aimock fixture driftruns daily against real OpenAI, opens an issue on drift, never auto-updates fixtures.Spec:
docs/superpowers/specs/2026-05-13-aimock-e2e-phase-2a-design.mdPlan:
docs/superpowers/plans/2026-05-13-aimock-e2e-phase-2a.mdTest plan
examples/chat/aimock-e2e/+ workflow files + minimalpackage.jsonadd)workflow_dispatchtrigger so it can be manually verified before relying on it