Skip to content

Add mock-based E2E tests and gate live tests to main/nightly#769

Open
adilhafeez wants to merge 7 commits intomainfrom
adil/mock_tests
Open

Add mock-based E2E tests and gate live tests to main/nightly#769
adilhafeez wants to merge 7 commits intomainfrom
adil/mock_tests

Conversation

@adilhafeez
Copy link
Contributor

@adilhafeez adilhafeez commented Feb 18, 2026

Summary

  • New mock-based E2E test suite (33 tests) using pytest_httpserver to simulate LLM provider responses — runs on every PR with zero secrets required
  • Gate all live E2E jobs to main pushes + daily nightly cron schedule (0 6 * * *), so PRs no longer depend on API keys
  • Scope secrets per CI job to only the keys each job actually needs (removed unused AZURE_API_KEY, AWS_BEARER_TOKEN_BEDROCK, ARCH_API_KEY from jobs that don't need them; added missing GROQ_API_KEY to test-prompt-gateway)
  • Relax exact-match LLM assertions in live e2e tests to structural checks (is not None + len > 0) since LLM output is non-deterministic

New test coverage (mock-based)

File Tests What it covers
test_model_alias_routing.py 13 Alias resolution, cross-provider protocol transformation, tool calls, thinking mode, error handling
test_responses_api.py 13 Passthrough, upstream translation, tools, mixed content, multi-turn state management
test_streaming.py 7 Streaming for all API shapes (OpenAI chat, Anthropic messages, Responses API) + cross-provider

adilhafeez and others added 7 commits February 18, 2026 19:33
Introduce a new mock-based E2E test suite that uses pytest_httpserver to
simulate LLM provider responses, eliminating the need for real API keys
on PR builds. The mock tests cover model alias routing, protocol
transformation (OpenAI↔Anthropic), Responses API passthrough/translation,
streaming, tool calls, thinking mode, and multi-turn state management.

CI changes:
- Add mock-e2e-tests job (zero secrets, runs on every PR)
- Gate all live E2E jobs to main pushes + nightly schedule
- Scope secrets to only the keys each job actually needs
- Add daily cron schedule for full live test coverage

Also relaxes exact-match assertions in live e2e tests to structural
checks (non-null, non-empty) since LLM output is non-deterministic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Import HandlerType from pytest_httpserver.httpserver (not top-level)
- Apply Black formatting to all new test files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- OpenAI client → Claude model: gateway routes to /v1/chat/completions
  (not /v1/messages), so use setup_openai_chat_mock
- Responses API: gateway translates all requests to /v1/chat/completions
  on upstream with base_url providers, so use setup_openai_chat_mock
- Remove unused imports (json, pytest, setup_responses_api_mock)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenAI model Responses API requests pass through to /v1/responses on the
upstream, which doesn't work with mock servers. Remove those tests from
the mock suite (they're covered by live e2e tests on main/nightly).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The job was only passing 3 of the 7 required API keys. Added the missing
MISTRAL_API_KEY, GROQ_API_KEY, AZURE_API_KEY, and AWS_BEARER_TOKEN_BEDROCK
to match the main branch configuration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 6 live E2E tests passed on this PR. Restoring the if: conditions
so they only run on pushes to main and nightly schedule.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@salmanap salmanap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of thoughts

1/ If the code path that we are testing is specific to an upstream LLM change, then we should have a way to test out the integration test in the PR. This way we can know with confidence that changes don't break the user experience when merging to main. Else it would be cut another PR and fix main. Not ideal

2/ I am not sure what value the mock tests create? If the idea is that we are testing our transformation logic and these upstream mock LLMs help us with that then that makes sense. But most of these tests are integration tests trying to see if the client (OpenAI, Anthropic) can process the message that plano sends back. In that case, I am not sure if there is a lot of value in creating a mock suite.

3/ If we go down this route (barring the discussions above). I think we don't want to duplicate each test type like streaming, model routing, or responses_api. We just need a way to update the tests to use a mock LLM. Also, I don't think we need put these tests in the old archgw folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants