Add mock-based E2E tests and gate live tests to main/nightly#769
Add mock-based E2E tests and gate live tests to main/nightly#769adilhafeez wants to merge 7 commits intomainfrom
Conversation
Introduce a new mock-based E2E test suite that uses pytest_httpserver to simulate LLM provider responses, eliminating the need for real API keys on PR builds. The mock tests cover model alias routing, protocol transformation (OpenAI↔Anthropic), Responses API passthrough/translation, streaming, tool calls, thinking mode, and multi-turn state management. CI changes: - Add mock-e2e-tests job (zero secrets, runs on every PR) - Gate all live E2E jobs to main pushes + nightly schedule - Scope secrets to only the keys each job actually needs - Add daily cron schedule for full live test coverage Also relaxes exact-match assertions in live e2e tests to structural checks (non-null, non-empty) since LLM output is non-deterministic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Import HandlerType from pytest_httpserver.httpserver (not top-level) - Apply Black formatting to all new test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- OpenAI client → Claude model: gateway routes to /v1/chat/completions (not /v1/messages), so use setup_openai_chat_mock - Responses API: gateway translates all requests to /v1/chat/completions on upstream with base_url providers, so use setup_openai_chat_mock - Remove unused imports (json, pytest, setup_responses_api_mock) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenAI model Responses API requests pass through to /v1/responses on the upstream, which doesn't work with mock servers. Remove those tests from the mock suite (they're covered by live e2e tests on main/nightly). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The job was only passing 3 of the 7 required API keys. Added the missing MISTRAL_API_KEY, GROQ_API_KEY, AZURE_API_KEY, and AWS_BEARER_TOKEN_BEDROCK to match the main branch configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 6 live E2E tests passed on this PR. Restoring the if: conditions so they only run on pushes to main and nightly schedule. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
salmanap
left a comment
There was a problem hiding this comment.
Couple of thoughts
1/ If the code path that we are testing is specific to an upstream LLM change, then we should have a way to test out the integration test in the PR. This way we can know with confidence that changes don't break the user experience when merging to main. Else it would be cut another PR and fix main. Not ideal
2/ I am not sure what value the mock tests create? If the idea is that we are testing our transformation logic and these upstream mock LLMs help us with that then that makes sense. But most of these tests are integration tests trying to see if the client (OpenAI, Anthropic) can process the message that plano sends back. In that case, I am not sure if there is a lot of value in creating a mock suite.
3/ If we go down this route (barring the discussions above). I think we don't want to duplicate each test type like streaming, model routing, or responses_api. We just need a way to update the tests to use a mock LLM. Also, I don't think we need put these tests in the old archgw folder.
Summary
pytest_httpserverto simulate LLM provider responses — runs on every PR with zero secrets required0 6 * * *), so PRs no longer depend on API keysis not None+len > 0) since LLM output is non-deterministicNew test coverage (mock-based)
test_model_alias_routing.pytest_responses_api.pytest_streaming.py