Add mock-based E2E tests and gate live tests to main/nightly by adilhafeez · Pull Request #769 · katanemo/plano

adilhafeez · 2026-02-18T23:43:05Z

Summary

New mock-based E2E test suite (33 tests) using pytest_httpserver to simulate LLM provider responses — runs on every PR with zero secrets required
Gate all live E2E jobs to main pushes + daily nightly cron schedule (0 6 * * *), so PRs no longer depend on API keys
Scope secrets per CI job to only the keys each job actually needs (removed unused AZURE_API_KEY, AWS_BEARER_TOKEN_BEDROCK, ARCH_API_KEY from jobs that don't need them; added missing GROQ_API_KEY to test-prompt-gateway)
Relax exact-match LLM assertions in live e2e tests to structural checks (is not None + len > 0) since LLM output is non-deterministic

New test coverage (mock-based)

File	Tests	What it covers
`test_model_alias_routing.py`	13	Alias resolution, cross-provider protocol transformation, tool calls, thinking mode, error handling
`test_responses_api.py`	13	Passthrough, upstream translation, tools, mixed content, multi-turn state management
`test_streaming.py`	7	Streaming for all API shapes (OpenAI chat, Anthropic messages, Responses API) + cross-provider

Introduce a new mock-based E2E test suite that uses pytest_httpserver to simulate LLM provider responses, eliminating the need for real API keys on PR builds. The mock tests cover model alias routing, protocol transformation (OpenAI↔Anthropic), Responses API passthrough/translation, streaming, tool calls, thinking mode, and multi-turn state management. CI changes: - Add mock-e2e-tests job (zero secrets, runs on every PR) - Gate all live E2E jobs to main pushes + nightly schedule - Scope secrets to only the keys each job actually needs - Add daily cron schedule for full live test coverage Also relaxes exact-match assertions in live e2e tests to structural checks (non-null, non-empty) since LLM output is non-deterministic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Import HandlerType from pytest_httpserver.httpserver (not top-level) - Apply Black formatting to all new test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- OpenAI client → Claude model: gateway routes to /v1/chat/completions (not /v1/messages), so use setup_openai_chat_mock - Responses API: gateway translates all requests to /v1/chat/completions on upstream with base_url providers, so use setup_openai_chat_mock - Remove unused imports (json, pytest, setup_responses_api_mock) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

OpenAI model Responses API requests pass through to /v1/responses on the upstream, which doesn't work with mock servers. Remove those tests from the mock suite (they're covered by live e2e tests on main/nightly). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The job was only passing 3 of the 7 required API keys. Added the missing MISTRAL_API_KEY, GROQ_API_KEY, AZURE_API_KEY, and AWS_BEARER_TOKEN_BEDROCK to match the main branch configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

All 6 live E2E tests passed on this PR. Restoring the if: conditions so they only run on pushes to main and nightly schedule. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

salmanap

Couple of thoughts

1/ If the code path that we are testing is specific to an upstream LLM change, then we should have a way to test out the integration test in the PR. This way we can know with confidence that changes don't break the user experience when merging to main. Else it would be cut another PR and fix main. Not ideal

2/ I am not sure what value the mock tests create? If the idea is that we are testing our transformation logic and these upstream mock LLMs help us with that then that makes sense. But most of these tests are integration tests trying to see if the client (OpenAI, Anthropic) can process the message that plano sends back. In that case, I am not sure if there is a lot of value in creating a mock suite.

3/ If we go down this route (barring the discussions above). I think we don't want to duplicate each test type like streaming, model routing, or responses_api. We just need a way to update the tests to use a mock LLM. Also, I don't think we need put these tests in the old archgw folder.

adilhafeez and others added 7 commits February 18, 2026 19:33

Fix HandlerType import and apply Black formatting

aeef0c3

- Import HandlerType from pytest_httpserver.httpserver (not top-level) - Apply Black formatting to all new test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ci: temporarily enable all live E2E tests on PR to verify they pass

cabb413

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ci: restore live E2E gates to main + nightly only

c410f16

All 6 live E2E tests passed on this PR. Restoring the if: conditions so they only run on pushes to main and nightly schedule. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

salmanap reviewed Feb 24, 2026

View reviewed changes

salmanap requested changes Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mock-based E2E tests and gate live tests to main/nightly#769

Add mock-based E2E tests and gate live tests to main/nightly#769
adilhafeez wants to merge 7 commits intomainfrom
adil/mock_tests

adilhafeez commented Feb 18, 2026 •

edited

Loading

Uh oh!

salmanap left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adilhafeez commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New test coverage (mock-based)

Uh oh!

salmanap left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adilhafeez commented Feb 18, 2026 •

edited

Loading