feat(agent-server): harden OpenAI gateway client compatibility#3626
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
Gateway compatibility behavior works end-to-end with a fake OpenAI backend and real OpenAI-compatible client requests; one non-functional PR description CI check is still failing.
Does this PR achieve its stated goal?
Yes. The PR set out to harden the agent-server OpenAI gateway for streaming chat completions, stream_options.include_usage, developer messages, and common OpenAI-compatible client request shapes. I reproduced the base-branch failures (stream: true returns 400; developer role returns 422), then verified the PR branch accepts those same requests through the OpenAI Python SDK and returns OpenAI-compatible SSE chunks, including a parsed final usage chunk.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv sync --dev completed; local agent-server ran with isolated /tmp/ohqa_home profile and fake OpenAI-compatible backend. |
| CI Status | PR Description Check / Validate PR description is failing and QA Changes by OpenHands was in progress when checked. |
| Functional Verification | ✅ OpenAI SDK non-streaming, SDK streaming with developer + include_usage, and LibreChat-shaped streaming HTTP request all worked on the PR branch. |
Functional Verification
Test 1: OpenAI SDK streaming and developer-role compatibility
Step 1 — Reproduce / establish baseline (without the fix):
Checked out base branch openhands/openai-chat-completions-gateway at 88a7d457, started openhands.agent_server on 127.0.0.1:19083, and ran a real OpenAI Python SDK request with stream=True, stream_options={"include_usage": True}, and then a request containing a developer role:
models: ['openhands_qa']
stream_error_type: BadRequestError
stream_error: Error code: 400 - {'detail': 'Streaming chat completions are not supported yet'}
developer_error_type: UnprocessableEntityError
developer_error: Error code: 422 - {'detail': [{'type': 'literal_error', 'loc': ['body', 'messages', 0, 'role'], 'msg': "Input should be 'system', 'user', 'assistant' or 'tool'", 'input': 'developer', 'ctx': {'expected': "'system', 'user', 'assistant' or 'tool'"}}]}
This confirms the compatibility gap existed on the base branch: streaming requests were rejected, and modern OpenAI developer messages were rejected during request validation.
Step 2 — Apply the PR's changes:
Checked out openhands/openai-gateway-client-compat at 4a4bfe5c, restarted openhands.agent_server on 127.0.0.1:19082, using the same isolated profile and fake OpenAI-compatible backend.
Step 3 — Re-run with the fix in place:
Ran the OpenAI Python SDK against /v1/chat/completions with a normal request, then a streaming request containing developer and stream_options.include_usage:
models: ['openhands_qa']
nonstream_content: non-streaming gateway request works
nonstream_usage: {'completion_tokens': 7, 'prompt_tokens': 11, 'total_tokens': 18, 'completion_tokens_details': None, 'prompt_tokens_details': None}
stream_chunk_count: 4
stream_text: developer role reached the backing LLM
stream_usage_chunks: [{'completion_tokens': 7, 'prompt_tokens': 11, 'total_tokens': 18, 'completion_tokens_details': None, 'prompt_tokens_details': None}]
This shows the OpenAI SDK streaming path now works: the SDK successfully iterated four streamed chunks, reconstructed the expected assistant content, and parsed the final usage chunk requested by stream_options.include_usage.
Test 2: LibreChat-shaped streaming request
Step 1 — Establish baseline:
The same base-branch SDK baseline above showed stream: true requests returned HTTP 400 before the PR. That is the key compatibility blocker for LibreChat-style custom OpenAI endpoints, which commonly send stream: true plus extra OpenAI parameters.
Step 2 — Apply the PR's changes:
Used the PR branch server at 127.0.0.1:19082.
Step 3 — Re-run with the fix in place:
Sent a real HTTP request shaped like a LibreChat custom endpoint call, including stream: true, user, temperature, top_p, presence_penalty, and frequency_penalty:
HTTP/1.1 200 OK
x-openhands-serverconversation-id: f62a7e58-2bd6-4dd0-aeca-0d121a201386
content-type: text/event-stream; charset=utf-8
data: {"id":"chatcmpl-a44d0374b37442a3bb808f84351471d4","choices":[{"delta":{"role":"assistant"},"index":0}],"created":1781088371,"model":"openhands_qa","object":"chat.completion.chunk"}
data: {"id":"chatcmpl-a44d0374b37442a3bb808f84351471d4","choices":[{"delta":{"content":"LibreChat-shaped streaming request works"},"index":0}],"created":1781088371,"model":"openhands_qa","object":"chat.completion.chunk"}
data: {"id":"chatcmpl-a44d0374b37442a3bb808f84351471d4","choices":[{"delta":{},"finish_reason":"stop","index":0}],"created":1781088371,"model":"openhands_qa","object":"chat.completion.chunk"}
data: [DONE]
This shows the gateway now accepts common OpenAI-compatible extra request fields and returns a proper text/event-stream response ending in [DONE].
Test 3: Upstream fake-backend evidence
I also inspected the fake backend request log after the PR-branch probes:
{'request': 5, 'stream': None, 'model': 'qa-model', 'roles': ['system', 'user'], 'developer_sentinel_seen': True, 'librechat_seen': False}
{'request': 6, 'stream': None, 'model': 'qa-model', 'roles': ['system', 'user'], 'developer_sentinel_seen': False, 'librechat_seen': True}
This confirms the developer instruction reached the backing LLM path and the LibreChat-shaped prompt was processed. The upstream request is intentionally non-streaming while the gateway synthesizes OpenAI-compatible client-side SSE, which matches the PR's documented scope.
Issues Found
- 🟡 Minor / non-functional:
PR Description Check / Validate PR descriptionis failing. I did not edit the human-only PR description fields; a human should update the PR description if that validator requires human content.
This review was created by an AI agent (OpenHands) on behalf of the user.
Final verdict: PASS WITH ISSUES
|
@OpenHands Understand the goal of this PR. It was originall stacked on top of another that we merged. Please merge main into this branch and fix conflicts. |
|
I'm on it! enyst can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
✅ PR Artifacts Cleaned Up The |
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
|
Merged Validation run:
PR is mergeable from GitHub's perspective; checks are pending. This comment was created by an AI agent (OpenHands) on behalf of the user. |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review: OpenAI Gateway Client Compatibility
Taste Rating
🟡 Acceptable - Functional and well-tested, with minor areas for improvement.
Analysis
[CRITICAL ISSUES]
None. No breaking issues found.
[IMPROVEMENT OPPORTUNITIES]
- [openhands-agent-server/openhands/agent_server/openai/service.py, Line 311] Simplification:
if content:check before yielding the content chunk. TheChoiceDeltaalready acceptsNonefor content - you could always yield this chunk and let the empty string be a no-op, eliminating one conditional.
[TESTING GAPS]
- [tests/cross/test_remote_conversation_live_server.py] Edge Case Missing: The test exercises
stream=Truewithstream_options={"include_usage": true}, but there's no test forstream=Truewithoutstream_options(the default case). Consider adding a second assertion block that omitsstream_optionsto ensure backward compatibility and correct behavior wheninclude_usageis False.
[RISK ASSESSMENT]
- [Overall PR] Risk Assessment: 🟢 LOW
This PR adds OpenAI-compatible streaming support and developer role handling to the gateway. The changes are backward-compatible (new fields are optional, defaults are preserved), and the security improvement to cookie lifetime (10 years -> 8 hours) is a net positive. The streaming implementation is straightforward SSE generation. The documentation in .pr/openai-gateway-client-compat.md is thorough and includes evidence of real-world testing with OpenAI SDK, Open WebUI, LibreChat, and Pipecat.
VERDICT:
✅ Worth merging: Core logic is sound. The streaming implementation follows the OpenAI SSE protocol correctly, and the test coverage is reasonable for a compatibility-focused PR.
KEY INSIGHT:
The streaming implementation is architecturally correct but currently buffered - it only emits chunks after the agent run completes, not progressively. This is documented as a known limitation and appropriately scoped for this compatibility PR.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
|
@OpenHands you have reviews inline and top level, please /iterate |
|
I'm on it! enyst can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed the review feedback in b16c44e and resolved the inline threads. Validation run locally:
This comment was created by an AI agent (OpenHands) on behalf of the user. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review Summary
🟢 Good taste — Clean, focused implementation that solves the stated problem well.
Files Reviewed
| File | Changes | Assessment |
|---|---|---|
openhands-agent-server/openhands/agent_server/openai/models.py |
+13 -2 | ✅ Well-structured type aliases and request model |
openhands-agent-server/openhands/agent_server/openai/service.py |
+81 -1 | ✅ Clean SSE generation with proper event ordering |
openhands-agent-server/openhands/agent_server/openai/router.py |
+19 -3 | ✅ Correct streaming response handling |
tests/cross/test_remote_conversation_live_server.py |
+49 -0 | ✅ Good test coverage for both usage-included and default cases |
Key Observations
Strengths:
- SSE event sequence is correct: role → content → finish_reason → (optional usage) →
[DONE] - The
iter_openai_chat_completion_ssefunction is a clean generator that properly handles JSON serialization - Test coverage includes both
include_usage: trueand default cases with proper assertions developerrole is correctly folded into system instructions alongsidesystem- Previous review feedback (content guard removal) was properly addressed in commit
b16c44e3
Implementation Quality:
- The streaming implementation follows OpenAI's SSE specification correctly
- Proper use of
model_dump(mode="json", exclude_none=True)for clean JSON output stream_options.include_usageis correctly handled as an optional feature
Minor Notes
-
The test assertions for token counts (
prompt_tokens==7,completion_tokens==5,total_tokens==12) are hardcoded values. These are fine for the test but would need updating if the prompt changes. This is acceptable for a smoke test. -
The
.pr/openai-gateway-client-compat.mdartifact is noted in the PR description as temporary and will be cleaned up by the PR Artifacts workflow before merge.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This is a focused feature addition for OpenAI-compatible streaming. The implementation:
- Adds new functionality without modifying existing behavior
- Has comprehensive test coverage
- Was validated against real clients (OpenAI SDK, Open WebUI, LibreChat, Pipecat)
- Follows established patterns from the codebase
No breaking changes, no security concerns, no dependency issues.
VERDICT:
✅ Worth merging — Implementation is sound, well-tested, and addresses the compatibility requirements.
KEY INSIGHT:
The SSE implementation correctly handles the three-chunk pattern (role, content, finish) with optional usage, matching OpenAI's streaming protocol. Previous review feedback has been addressed.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The OpenAI-compatible gateway changes work end-to-end through a real local agent-server and the OpenAI Python SDK streaming client.
Does this PR achieve its stated goal?
Yes. On origin/main, the same user-facing requests failed: developer messages were rejected with 422 and stream: true requests returned 400 Streaming chat completions are not supported yet. On this PR, the gateway accepted a developer role plus common client parameters, returned OpenAI SDK-consumable streaming chunks, emitted a final usage chunk only when stream_options.include_usage was set, and preserved non-streaming behavior.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv created the environment and both base/PR agent-server instances started locally. |
| CI Status | ⏳ At review time: 19 successful checks, 8 in progress, 1 queued, 1 skipped, no failures observed. |
| Functional Verification | ✅ Real OpenAI SDK requests verified streaming, usage chunks, developer role, extra params, and non-streaming behavior. |
Functional Verification
Test 1: Baseline failures on origin/main
Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main, started the real server with:
env -u TMUX TMUX_TMPDIR=/tmp/oh-qa-tmux-base OPENHANDS_SUPPRESS_BANNER=1 LLM_MODEL=openhands/claude-haiku-4-5-20251001 uv run python -m openhands.agent_server --host 127.0.0.1 --port 12002Then ran an OpenAI Python SDK request using stream=True, stream_options={"include_usage": True}, user="qa-user", and a developer message:
models ['openhands_qa']
developer_stream_error_type UnprocessableEntityError
developer_stream_error Error code: 422 - {'detail': [{'type': 'literal_error', 'loc': ['body', 'messages', 0, 'role'], 'msg': "Input should be 'system', 'user', 'assistant' or 'tool'", 'input': 'developer', 'ctx': {'expected': "'system', 'user', 'assistant' or 'tool'"}}]}
This confirms the old gateway rejected newer OpenAI-compatible developer messages before the agent could run.
I also sent a user-only stream=True request on the base branch:
user_stream_error_type BadRequestError
user_stream_error Error code: 400 - {'detail': 'Streaming chat completions are not supported yet'}
This confirms the baseline did not support streaming chat completions.
Step 2 — Apply the PR's changes:
Checked out openhands/openai-gateway-client-compat at b16c44e3201c2525ff20a0078099045096a1d49c, started the real server with:
env -u TMUX TMUX_TMPDIR=/tmp/oh-qa-tmux-pr OPENHANDS_SUPPRESS_BANNER=1 LLM_MODEL=openhands/claude-haiku-4-5-20251001 uv run python -m openhands.agent_server --host 127.0.0.1 --port 12000Saved a real gateway profile through /api/profiles/qa and verified it was visible through /v1/models:
save_profile_status 201
{"name":"qa","message":"Profile 'qa' saved"}
models_status 200
{"object":"list","data":[{"id":"openhands_qa","created":0,"object":"model","owned_by":"openhands"}]}
Step 3 — Re-run with the fix in place:
Ran the same OpenAI SDK streaming path against http://127.0.0.1:12000/v1 using model="openhands_qa", a developer message, stream=True, stream_options={"include_usage": True}, user, temperature, top_p, presence_penalty, and frequency_penalty:
chunk_count 4
stream_text GATEWAY_QA_STREAM_OK
usage_chunk_count 1
usage_last {'completion_tokens': 94, 'prompt_tokens': 4956, 'total_tokens': 5050, 'completion_tokens_details': None, 'prompt_tokens_details': None}
finish_reasons [None, None, 'stop']
This shows the OpenAI SDK could consume the response as a stream, the final content arrived correctly, and include_usage produced exactly one usage-bearing chunk before completion.
Test 2: Default stream behavior and non-stream regression check
Ran one streaming request without stream_options.include_usage, then a normal non-streaming completion:
default_stream_chunk_count 3
default_stream_text GATEWAY_QA_NO_USAGE_OK
default_stream_usage_chunks 0
nonstream_text GATEWAY_QA_NONSTREAM_OK
nonstream_usage_total 5004
This confirms default streaming omits usage chunks, and the existing non-streaming gateway path still returns an ordinary OpenAI SDK chat completion with usage.
Issues Found
None.
Final verdict: PASS.
This QA review was generated by an AI agent (OpenHands) on behalf of the user.
|
OpenHands encountered an error: **Failed to send message to agent server: HTTP 503 error: no available server See the conversation for more information. |
HUMAN:
This PR proposes introduction of streaming, a must-have for LLM
completionsendpoint. It is intently applied to the last agent response. Also accepts a few more parameters that are expected.AGENT:
Why
Summary
Fixes compatibility-hardening scope for #3540 on top of #3545.
Fixed #3597
stream: truechat completions and return OpenAI-compatible SSE chunksstream_options.include_usagewith a final usage chunk before[DONE]developerrole messages and fold them into gateway system instructions.pr/openai-gateway-client-compat.mdwith temporary live-test notes for Open WebUI, LibreChat-shaped requests, and PipecatTesting
TMUX_TMPDIR=/tmp/oh-test-tmux-precommit-$RANDOM uv run pre-commit run --files openhands-agent-server/openhands/agent_server/openai_models.py openhands-agent-server/openhands/agent_server/openai_service.py openhands-agent-server/openhands/agent_server/openai_router.py tests/cross/test_remote_conversation_live_server.py .pr/openai-gateway-client-compat.mdTMUX_TMPDIR=/tmp/oh-test-tmux-$RANDOM uv run pytest tests/cross/test_remote_conversation_live_server.py::test_openai_chat_completions_gateway_over_real_server tests/cross/test_remote_conversation_live_server.py::test_openai_gateway_replays_frozen_llm_fixtures -qLive compatibility probes
uvx, opened in the browser, fetched gateway models, selectedopenhands_compat, and completed a chat through the gateway.stream: trueplus common OpenAI params) was exercised directly and passed.OpenAILLMServicewas installed/imported and both one-shot and streaming OpenAI-compatible voice-framework paths passed.Full live-tests
Please read full description of the tests I performed:
#3540 (comment)
PR-only Live-test Artifacts
This PR intentionally includes
.pr/openai-gateway-client-compat.mdas temporary reviewer evidence. The repository's.github/workflows/pr-artifacts.yml/ PR Artifacts workflow is expected to remove.pr/through its approval-triggered cleanup path before final merge, matching the repo workflow documented on #3545.This PR was created by an AI agent (OpenHands) on behalf of the user.
@enyst can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:f8781e7-pythonRun
All tags pushed for this build
About Multi-Architecture Support
f8781e7-python) is a multi-arch manifest supporting both amd64 and arm64f8781e7-python-amd64) are also available if needed