feat(acp): bump ACP provider CLIs (claude 0.44, codex 0.16, gemini 0.46) + configOptions model selection#3773
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||||||||||||
…+ configOptions model selection
Bumps all three built-in ACP provider CLIs to their latest npm releases and
adapts the SDK to the model-selection protocol change two of them shipped.
Verified e2e against every CLI (production binary path) before bumping:
- claude-agent-acp 0.30 -> 0.46 and codex-acp 0.15 -> 0.16 dropped the UNSTABLE
`models` capability + `session/set_model` (returns "Method not found") and
moved model selection to a `model` `configOptions` select driven by
`session/set_config_option`. With the old code, init crashes the moment a
model is set. The SDK now auto-detects the mechanism from the session/new (or
load_session) response and applies the model via the right call across init,
runtime switch, and resume. gemini-cli 0.46 keeps `models`/`set_session_model`
and is unchanged on that axis.
- gemini-cli 0.46 rejects `set_session_mode("yolo")` at session init
(JSON-RPC -32603), which crashed headless startup. Switched the gemini default
session mode to `default`; the ACP bridge already auto-approves every
`session/request_permission`, so permission prompts never block regardless of
mode (verified a real tool/edit turn).
- Reconciled `_CLAUDE_MODELS` and `_GEMINI_MODELS` to the `availableModels` each
bumped CLI actually reports; `_CODEX_MODELS` is unchanged (codex 0.16 accepts
the same combined `gpt-5.5/medium` ids via set_config_option). claude
`default_model` -> `opus[1m]` (the CLI's own default), gemini -> `auto`.
Pins move together in acp_providers.py + the agent-server Dockerfile. All three
providers pass an init + tool turn + live model-switch e2e against the pinned
binaries.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f0babb6 to
d412c0b
Compare
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review Summary
Taste Rating: 🟢 Good taste — solid architecture, well-documented
This PR cleanly adapts the SDK to two breaking changes in the provider CLIs:
- codex-acp 0.16+ / claude-agent-acp 0.46+ dropped the UNSTABLE
modelscapability +session/set_modelin favor of amodelconfigOption select driven bysession/set_config_option - gemini-cli 0.46.0 rejects
set_session_mode("yolo")at session init
The response-detection pattern (reading the session/new response to determine which protocol to use) is the right architectural choice — it future-proofs the SDK against further CLI changes without requiring version-gating throughout. The code is clean, the comments explain the why, and the getattr-based tolerance for partial response structures is appropriate for a protocol boundary.
[TESTING GAPS]
None. The unit tests are comprehensive (875 passing). The TestConfigOptionModelMechanism class covers the new code paths well. The end-to-end verification table in the PR description provides the critical evidence that the protocol detection actually works with the real binaries.
[RISK ASSESSMENT]
- Overall PR:
⚠️ Risk Assessment: 🟡 MEDIUM
Key risk factors:
- Dependency time-gate:
claude-agent-acp@0.46.0was published 2026-06-16. At time of review (2026-06-17) it is less than 7 days old. The supply-chain-security policy requires waiting for the 7-day aging before merging — this is a process gate, not a code issue. - Downstream coordination: The PR notes that
typescript-client/src/models/acp-providers.jsonneeds a matching update + release, andagent-canvasdepends on that. This cross-repo dependency needs coordination but doesn't block this PR. - No breaking changes to existing behavior: The mechanism is auto-detected, so existing users with pinned session behavior should see no disruption.
VERDICT:
✅ Worth merging (pending supply-chain time-gate expiry)
KEY INSIGHT:
The response-detection architecture elegantly handles heterogeneous provider behaviors without version-gating — future CLI changes that use a different model selection mechanism will be detected automatically without requiring SDK updates.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
❌ QA Report: FAIL
The bumped Gemini CLI path now gets past the former yolo session-mode crash, but the PR does not achieve its main ACP model-selection goal because both Claude 0.46.0 and Codex 0.16.0 still fail during SDK session init with Method not found for model selection.
Does this PR achieve its stated goal?
No. The stated goal is to bump the built-in ACP CLIs and adapt model selection so codex/claude use configOptions instead of the removed session/set_model; however, real SDK Conversation startup against the pinned Claude and Codex binaries still crashes before any user turn can run. Gemini’s yolo → default change was functionally verified at init time, but the core claude/codex regression remains.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully. |
| CI Status | 🟡 Most checks were green/skipped at review time; Agent Server / Build & Push (python-arm64) was still in progress. |
| Functional Verification | ❌ Claude and Codex fail during real SDK ACP startup; Gemini init progresses until the expected fake-key provider error. |
Functional Verification
Test 1: Gemini 0.46 session mode baseline vs PR
Step 1 — Reproduce / establish baseline without the fix:
Ran on origin/main while overriding the provider command to the bumped binary:
cd /tmp/qa-main-pr3773 && ACP_COMMAND_JSON='["npx","-y","@google/gemini-cli@0.46.0","--acp"]' GEMINI_API_KEY=fake-gemini-key OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_run.py gemini-cli autoKey output:
ACP server initialized: agent_name='gemini-cli', agent_version='0.46.0'
Authenticating with ACP method: gemini-api-key
Setting ACP session mode: yolo
Failed to start ACP server: Internal error
exception_type=RequestError
exception=Internal error
This confirms the old default mode fails at session init with the bumped Gemini CLI.
Step 2 — Apply the PR's changes:
Used the checked-out PR commit f0babb6f0ed8d5b21373a5c256a7e71570fcafea.
Step 3 — Re-run with the fix in place:
GEMINI_API_KEY=fake-gemini-key OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_run.py gemini-cli autoKey output:
agent.command=['npx', '-y', '@google/gemini-cli@0.46.0', '--acp']
ACP server initialized: agent_name='gemini-cli', agent_version='0.46.0'
Authenticating with ACP method: gemini-api-key
Setting ACP session mode: default
Sending ACP prompt (idle_timeout=90s, blocks=1)
API key not valid. Please pass a valid API key.
This shows the PR fixes the Gemini init crash: the session reaches prompt execution and only then fails because the QA environment has a deliberately fake Gemini API key.
Test 2: Claude 0.46 model selection on the PR
Step 1 — Reproduce / establish baseline:
The PR description says the pre-fix failure mode is RequestError: "Method not found": session/set_model when applying a model with claude-agent-acp 0.46.0.
Step 2 — Apply the PR's changes:
Used the checked-out PR commit f0babb6f0ed8d5b21373a5c256a7e71570fcafea.
Step 3 — Run the real SDK path with the fix in place:
ANTHROPIC_API_KEY=sk-ant-fake-for-qa OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_run.py claude-code 'opus[1m]'Key output:
agent.command=['npx', '-y', '@agentclientprotocol/claude-agent-acp@0.46.0']
agent.initial_model=opus[1m]
ACP server initialized: agent_name='@agentclientprotocol/claude-agent-acp', agent_version='0.46.0'
Failed to start ACP server: "Method not found": session/set_model
exception_type=RequestError
exception="Method not found": session/set_model
This shows the exact failure the PR set out to fix is still present for Claude on the PR branch; the conversation never reaches a user turn.
Test 3: Codex 0.16 model selection on the PR
Step 1 — Reproduce / establish baseline:
The PR description says the pre-fix failure mode is RequestError: "Method not found": session/set_model when applying a model with codex-acp 0.16.0.
Step 2 — Apply the PR's changes:
Used the checked-out PR commit f0babb6f0ed8d5b21373a5c256a7e71570fcafea.
Step 3 — Run the real SDK path with the fix in place:
OPENAI_API_KEY=sk-fake-for-qa OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_run.py codex gpt-5.5/mediumKey output:
agent.command=['npx', '-y', '@zed-industries/codex-acp@0.16.0']
agent.initial_model=gpt-5.5/medium
ACP server initialized: agent_name='codex-acp', agent_version='0.16.0'
Authenticating with ACP method: openai-api-key
Failed to start ACP server: Method not found
exception_type=RequestError
exception=Method not found
This shows Codex also still fails during init on the PR branch before any edit turn or model switch can be exercised.
Unable to Verify
I could not verify full live provider edit turns or runtime model switching because this QA environment does not have real Claude/Codex/Gemini credentials. I used fake provider keys only to exercise ACP initialization and avoid substituting --help/--version for functional verification. Future QA runs would benefit from AGENTS.md guidance describing which throwaway provider credentials or sandbox accounts are available for ACP end-to-end checks.
Issues Found
- 🔴 Blocker: Claude 0.46.0 and Codex 0.16.0 still fail during real SDK ACP startup with
Method not foundfor model selection, so the PR does not deliver its stated claude/codex configOptions model-selection fix.
This review was created by an AI agent (OpenHands) on behalf of the user.
…not-found Response-detection (`_session_selects_model_via_config_option`) picks set_config_option vs set_session_model from the session response and is correct for every validated CLI. But it reads an UNSTABLE capability whose shape can vary by build/auth state, so a misdetect would crash session init or a model switch with JSON-RPC -32601 "Method not found". Add `_apply_acp_model_with_fallback`: if the chosen model-apply call returns -32601, retry once with the other mechanism (and, for runtime switches, remember the working one). Wired into init (`_maybe_set_session_model`) and `ACPAgent.set_acp_model`; resume already tolerates rejection by keeping the server default, so it's left as-is. A non-method-not-found error (e.g. -32602 invalid model) still propagates — it's a real client error, not a wrong-mechanism signal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for the QA pass. Two of the bot's three findings need context — the Blocker is a false positive against a superseded commit run in a credential-less environment, and the bot's own review of the current HEAD found no issues:
To make this robust regardless of QA-env quirks, I also pushed |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review - PR #3773
Taste Rating
🟡 Acceptable — Solid implementation with a blocking QA issue that must be resolved.
[CRITICAL ISSUES]
- [openhands-sdk/openhands/sdk/agent/acp_agent.py, Line 407-418] Regression - Detection Logic Failure: The existing QA review (comment from
all-hands-bot) correctly identifies that_session_selects_model_via_config_option()is returningFalsefor claude-agent-acp 0.46.0 and codex-acp 0.16.0, causing the SDK to call the removedsession/set_modelmethod instead ofsession/set_config_option. This crashes ACP init. Thegetattr(response, "config_options", None)call appears to be returningNonewhen the real CLIs do provideconfig_options— likely a field name mismatch or the CLIs report this under a different attribute name. This is a hard blocker: the core functionality this PR claims to implement (configOptions model selection) is not working.
[IMPROVEMENT OPPORTUNITIES]
-
[openhands-sdk/openhands/sdk/agent/acp_agent.py] Defensive Coding: The fallback in
_apply_acp_model_with_fallbackis designed to handle detection failures, but based on the QA findings, this fallback is not being triggered. Consider adding explicit detection debugging/logging to surface what fields the real CLI responses actually contain vs. what the code expects to find. -
[openhands-sdk/openhands/sdk/settings/acp_providers.py] API Drift: The model ID changes (
claude-opus-4-8toopus[1m],auto-gemini-2.5toauto) are correct for the new CLI versions, but downstream users who have persisted model IDs may hit a mismatch on resume.
[TESTING GAPS]
- [tests/sdk/agent/test_acp_agent.py] Integration Test Needed: The unit tests use mock
SimpleNamespaceobjects that perfectly mimic the expected response structure. These pass but do not catch the real-world failure. Consider adding an integration test that verifies the detection logic against actual CLI responses.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🔴 HIGH
This PR bumps three external CLI dependencies and introduces a new model-selection protocol. The existing QA review confirms the core functionality is broken. Without fixing the detection logic, merging this would regress all ACP sessions using claude-agent-acp 0.46.0 or codex-acp 0.16.0.
Recommendation: Do not merge until the detection logic correctly identifies config_options in real CLI responses.
VERDICT:
❌ Needs rework: The detection logic in _session_selects_model_via_config_option() is not correctly identifying the configOptions mechanism. Fix the field name or access path before this PR can be merged.
KEY INSIGHT:
The PR's fallback architecture is well-designed — detecting the wrong mechanism and falling back would still succeed — but the detection is returning False when it should return True, so the wrong call is made with no fallback triggered.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Address the QA "mocks don't catch the real shape" gap and refute the hypothesised `config_options is None` field-name mismatch: parse the actual claude-agent-acp 0.46.0 / codex-acp 0.16.0 / gemini-cli 0.46.0 `session/new` wire payloads (captured from the pinned binaries) through the real `acp.schema.NewSessionResponse`, then assert the detection + extraction: - claude 0.46 / codex 0.16: `models: null` + a `model` configOptions select ⇒ `_session_selects_model_via_config_option` is True; current/available read from the select (opus[1m] / gpt-5.5). - gemini 0.46: `models` capability present, no model configOption ⇒ False (set_session_model), models read from `availableModels`. These exercise the genuine alias mapping (`configOptions`, `currentValue`, `modelId`), so any field-name/access-path drift fails the test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The CRITICAL finding re-derives the same conclusion from the stale QA run rather than the code, and the specific hypothesis is disprovable:
It does not.
This exercises the genuine alias mapping (
Correct — and that's the expected, healthy state: detection is right, so the primary call ( Direct evidence the QA env ran stale code: its log shows No code change is warranted for the blocker; resolving the thread. Happy to add anything else. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review - PR #3773
Taste Rating
🟢 Good taste — Solid dual-mechanism architecture with well-reasoned fallback logic
Summary
This PR adapts the SDK to breaking changes in three ACP provider CLIs and adds a robust dual-mechanism model-selection architecture that handles the protocol divergence across versions. The changes are minimal, well-scoped, and the code quality is high throughout.
[SUPPLY CHAIN NOTES]
⚠️ claude-agent-acp@0.46.0 (published 2026-06-16) — published <7 days ago. Hold approval until the version is at least 7 days old and release provenance is verified. This is noted in the PR description but worth flagging explicitly to prevent accidental early merge.- The other two bumped versions (codex-acp@0.16.0, gemini-cli@0.46.0) appear to predate this and are not flagged.
[TESTING GAPS]
- [tests/sdk/agent/test_acp_agent.py] The
TestDetectionAgainstRealSessionResponsesadded in the latest commit (1f90820cf) addresses the earlier QA concern by testing against real capturedsession/newpayloads. This is the right approach — good.
[STYLE NOTES]
- No significant style issues. The code is clean and well-documented.
- The inline comment explaining the
_METHOD_NOT_FOUND_CODErationale is helpful and appropriately scoped to the code it documents.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This PR makes a coordinated version bump across three external CLI dependencies, which always carries some risk. However:- The dual-mechanism detection + fallback architecture is well-designed and defensive against detection errors.
- The fallback path (
_apply_acp_model_with_fallback) ensures that even if detection misidentifies the mechanism, the session will still initialize. - Model lists are reconciled to match what each CLI actually reports, eliminating registry drift.
- gemini
default_session_modechange fromyolotodefaultis safe because the ACP bridge auto-approves all permission requests, so prompts never block. - Remaining risk:
claude-agent-acp@0.46.0is <7 days old — do not merge until it ages and provenance is verified.
VERDICT
✅ Worth merging — once claude-agent-acp@0.46.0 is >7 days old and provenance is confirmed, this is ready to merge. The architecture is sound, the fallback is well-designed, and the latest commit addresses the earlier QA failure with real-payload tests.
KEY INSIGHT
The detection + fallback pattern (_session_selects_model_via_config_option → _apply_acp_model_with_fallback) is an elegant solution to protocol evolution: rather than baking in version checks, the SDK reads what the server actually supports and falls back gracefully when detection is wrong — future-proofing against the next CLI version change.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
The previous revision built the session responses with `NewSessionResponse.model_validate(<dict>)`, which re-parses the `config_options` discriminated union (`SessionConfigOptionSelect | SessionConfigOptionBoolean`). Under the full `-n auto` suite that path is sensitive to a pre-existing cross-test interaction (a sibling test mutates pydantic discriminated-union state), so the claude/codex cases failed in CI while passing in isolation; the gemini case (plain `models`, no union) passed. Construct the responses from already-instantiated `acp.schema` objects instead. `NewSessionResponse.model_config` is `revalidate_instances="never"`, so passing instances skips the union re-validation entirely — deterministic regardless of test order — while still asserting detection/extraction against the real schema types (real `config_options` / `current_value` / `model_id` fields). Adds an explicit `config_options` field-name assertion (no parsing) that directly refutes the "field name mismatch" hypothesis. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review - PR #3773
Taste Rating
🟢 Good taste — Elegant, well-structured solution with comprehensive fallback handling
[CRITICAL ISSUES]
None identified. The architecture properly handles the CLI API migration with detection + fallback.
[OBSERVATIONS]
-
Detection + Fallback Architecture (Lines 47-146 in
acp_agent.py): The combination of_session_selects_model_via_config_option()for detection and_apply_acp_model_with_fallback()for resilience is a solid approach. The -32601 "method not found" error code serves as an explicit mechanism for the CLI to signal which API it supports, providing a clean fallback path. -
Test Coverage: The
TestDetectionAgainstRealSessionResponsesclass (using realNewSessionResponseschema types) andTestApplyAcpModelFallbackprovide good coverage for both the happy path and error recovery. The schema field name validation test (test_config_options_is_the_schema_field_name) is a good guard against future field renames. -
Backward Compatibility Note: Model ID changes (
claude-opus-4-8→opus[1m],auto-gemini-2.5→auto) are correct for the new CLI versions. Users with persisted session state containing old model IDs will need to re-select their model on resume — this is documented in the updated provider comments. -
Gemini CLI Session Mode (Line 463-469 in
acp_providers.py): The change from"yolo"to"default"fordefault_session_modeis appropriate. The comment correctly explains that:yolowould crash at session init with -32603 (before folder-trust is established)- The SDK's ACP bridge auto-approves permission requests anyway
- This makes
defaultthe headless-safe choice
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This PR bumps external CLI dependencies and adds proper detection/fallback for API changes. The comprehensive test suite covering real schema types, fallback logic, and both new and old CLI versions provides confidence the changes work correctly.
VERDICT:
✅ Worth merging: Core logic is sound, tests are comprehensive, and the fallback architecture provides resilience against detection edge cases.
KEY INSIGHT:
The PR's approach of detecting the model-selection mechanism from session responses and falling back on -32601 (method not found) provides a clean, resilient solution that will work across CLI version boundaries.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
…cp 0.8.x) The shipped `agent-client-protocol` lib (0.8.1, pinned in uv.lock / the agent-server image) wraps each `session/new` `configOptions` entry in a `SessionConfigOption` RootModel (`class SessionConfigOption(RootModel[ SessionConfigOptionSelect])`), so `opt.id` / `opt.type` are `None` and must be read off `opt.root`. `_model_config_option` read them directly, so `_session_selects_model_via_config_option` returned `False` for claude-agent-acp 0.46 / codex-acp 0.16 — the SDK then called the removed `session/set_model` and crashed init with "Method not found". (0.10.x lists the union members directly, which is why local runs against a drifted venv passed; CI/production run 0.8.1.) Unwrap `getattr(raw, "root", raw)` so detection works on 0.8.x and 0.10.x. Tests now parse the real wire payloads through `NewSessionResponse` (exercising the 0.8.x RootModel wrapper) plus an explicit `.root`-unwrap unit test. Re-validated e2e against acp 0.8.1 + the pinned binaries: claude 0.46 and codex 0.16 both detect `via_config_option=True`, write a file, and track a live model switch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Correction — the QA was right, and I was wrong to call it stale. I traced the real root cause and pushed a fix ( The failure was not a stale environment. The shipped My e2e gave false confidence because my local venv had drifted to acp 0.10.1 (where The fix unwraps
Re-requesting review. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review: PR #3773
Taste Rating
🟢 Good taste — Elegant dual-mechanism architecture with well-reasoned fallback logic.
Summary
This PR bumps three ACP provider CLIs to their latest npm releases and adapts the SDK to two protocol changes:
- codex-acp 0.16+ / claude-agent-acp 0.44+ moved model selection from the removed
session/set_modeltosession/set_config_option(configId="model"). The SDK detects which mechanism the session advertises at init time and uses the right call throughout. - gemini-cli 0.46.0 rejects
set_session_mode("yolo")at session init; the registry default is updated to"default"(the ACP bridge auto-approves everysession/request_permissionso prompts never block regardless of mode).
The response-detection approach (reading the session/new response to pick the apply mechanism) is the right architectural choice — it future-proofs the SDK without version-gating everywhere. The getattr-based tolerance for partial response structures is appropriate for a protocol boundary. The model ID list changes in _CLAUDE_MODELS and _GEMINI_MODELS align with what each CLI reports at session/new on the pinned versions.
[IMPROVEMENT OPPORTUNITIES]
-
acp_providers.py: The
defaultmodel label says "Opus 4.8 · 1M" — the tier name is implicit. "Default (Opus 4.8 · 1M)" or "Claude Opus 4.8 · 1M (default)" would be clearer at a glance. -
acp_providers.py: The comment referencing
auto-gemini-2.5/DEFAULT_GEMINI_MODEL_AUTOshould be updated to reference the newautorouter id to stay in sync with the code.
[RISK ASSESSMENT]
- Overall PR:
⚠️ Risk Assessment: 🟡 MEDIUM
Key factors:
- Supply-chain hold:
claude-agent-acp@0.45.0/0.46.0were published <7 days ago; the PR pins at@0.44.0(>7 days old, acceptable). Verify that0.44.0still has theconfigOptionsmechanism before merging. - Functional verification: Prior QA flagged a detection failure for claude/codex. If CI is green for those paths, the
method_not_foundfallback in_apply_acp_model_with_fallbackprovides defense-in-depth. - gemini session mode: Low risk — the ACP bridge auto-approves every permission request regardless of mode.
_This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation.
|
Confirmed —
So detection routes 0.44.0 through |
Superseded: the RootModel detection bug this flagged is fixed (4c94956) and verified e2e against the pinned binaries; all-hands-bot's latest review on the current HEAD is 'Worth merging' (🟢 Good taste), CI is green, and claude is pinned at 0.44.0 (>7 days old, clears the supply-chain hold).
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review - PR #3773
Taste Rating
🟡 Acceptable — Solid dual-mechanism architecture, blocking on supply chain time-gate
Analysis
This PR adapts the SDK to two breaking changes in upstream provider CLIs:
- codex-acp 0.16+ / claude-agent-acp 0.44+ dropped the
modelscapability +session/set_modelin favor of amodelconfigOptions select driven bysession/set_config_option - gemini-cli 0.46.0 rejects
set_session_mode("yolo")at session init
The response-detection pattern is the right architectural choice:
_session_selects_model_via_config_option()reads the session/new response to detect which mechanism the server advertises_apply_acp_model_with_fallback()handles the edge case where detection is wrong (cross-mechanism fallback on -32601)- The
getattr-based tolerance for RootModel wrappers (getattr(raw, "root", raw)) correctly handles the agent-client-protocol 0.8.x vs 0.10.x schema difference
The code is clean and well-documented. Helper functions are small and focused.
[CRITICAL ISSUES]
- [openhands-sdk/openhands/sdk/settings/acp_providers.py, Version pins]
⚠️ Supply Chain Risk:claude-agent-acp@0.44.0was published 2026-06-16 (< 7 days ago per system date 2026-06-17). Per supply-chain policy, do not merge until the version is at least 7 days old. The PR description correctly identifies this as "a time-gate, not a code-fixable issue."
[IMPROVEMENT OPPORTUNITIES]
None. The implementation is pragmatic and the fallback mechanism appropriately handles the reality that protocol detection reads an UNSTABLE capability.
[TESTING GAPS]
None. Unit tests (875 passing) are comprehensive:
TestConfigOptionModelMechanismcovers new detection/extraction pathsTestDetectionAgainstRealSessionResponsestests against real wire payloads parsed throughNewSessionResponse.model_validate()TestApplyAcpModelFallbackvalidates the cross-mechanism fallback
The PR description provides end-to-end evidence table showing all three providers work with live binaries.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
Key factors:
- Supply chain time-gate:
claude-agent-acp@0.44.0published 2026-06-16 — requires 7-day aging before merge - Downstream coordination needed:
typescript-client/src/models/acp-providers.jsonneeds matching update + release;agent-canvasdepends on it - No breaking user-facing changes: Auto-detection means existing users see no disruption
VERDICT
✅ Worth merging (pending supply-chain time-gate expiry for claude-agent-acp 0.44.0)
KEY INSIGHT
The dual-mechanism detection pattern future-proofs the SDK against further CLI changes without version-gating, and the cross-mechanism fallback on -32601 is a pragmatic self-healing mechanism for an inherently unstable protocol boundary.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 QA Report: PARTIAL
Verified the SDK behavior targeted by the PR with before/after ACP runs and initialized the real bumped provider CLIs; full authenticated provider edit turns could not be completed without valid Claude/Codex/Gemini credentials.
Does this PR achieve its stated goal?
Partially verified, with positive evidence. The main behavioral fixes claimed by the PR work in executable SDK scenarios: a configOptions-based ACP server that fails on session/set_model crashes on main but succeeds on this PR via session/set_config_option, including a live switch_acp_model; and a Gemini-like server that rejects yolo fails on main but succeeds on this PR because the SDK sends default. I also started the real pinned Gemini 0.46.0, Claude ACP 0.44.0, and Codex ACP 0.16.0 binaries through ACPAgent; each initialized at the expected version, but prompt/file-edit completion was blocked by invalid/missing provider credentials.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed and installed the uv workspace successfully. |
| CI Status | ✅ PR checks observed mostly green; current qa-changes was still in progress, with several optional workflows skipped. |
| Functional Verification | 🟡 Core SDK protocol/default behavior passed; authenticated real provider turns were not fully verifiable in this environment. |
Functional Verification
Test 1: configOptions model selection replaces removed session/set_model
Step 1 — Reproduce baseline on origin/main:
Ran an actual ACPAgent + Conversation.run() against a minimal ACP-compatible subprocess that advertises models: null plus a model configOptions select and returns JSON-RPC -32601 for session/set_model:
raised= RequestError
message= Method not found
log:
{"method": "initialize", "params": {"protocolVersion": 1}}
{"method": "session/new", "params": {"cwd": "/tmp/tmpem8gf37k", "mcpServers": []}}
{"method": "session/set_model", "params": {"modelId": "gpt-5.5/medium", "sessionId": "sess-1"}}
This confirms the baseline failure mode from the PR description: configOptions-based ACP servers crash during model application because the SDK calls the removed session/set_model method.
Step 2 — Apply the PR's changes:
Checked out acp-bump-provider-clis at d9d7a0afe36851486ee8eeda83c4864441ab5461.
Step 3 — Re-run with the fix in place:
Ran the same ACPAgent + Conversation.run(), then performed a live conversation.switch_acp_model("gpt-5.4-mini/low"):
after_run_model= gpt-5.5/medium
after_run_current= gpt-5.5/medium
after_run_available= ['gpt-5.5/medium', 'gpt-5.4-mini/low']
after_switch_model= gpt-5.4-mini/low
after_switch_current= gpt-5.4-mini/low
log:
{"method": "session/new", "params": {"cwd": "/tmp/tmpxeib_wrx", "mcpServers": []}}
{"method": "session/set_config_option", "params": {"configId": "model", "sessionId": "sess-1", "value": "gpt-5.5/medium"}}
{"method": "session/prompt", "params": {"prompt": [{"text": "Say hello using the configured ACP model.", "type": "text"}], "sessionId": "sess-1"}}
{"method": "session/set_config_option", "params": {"configId": "model", "sessionId": "sess-1", "value": "gpt-5.4-mini/low"}}
This confirms the PR detects the configOptions mechanism, extracts the model list/current value, applies the initial model via session/set_config_option, and uses the same mechanism for runtime switching.
Test 2: Gemini default session mode no longer sends rejected yolo
Step 1 — Reproduce baseline on origin/main:
Ran ACPAgent + Conversation.run() against a Gemini-like ACP subprocess that rejects only session/set_mode(modeId="yolo"):
Setting ACP session mode: yolo
raised= RequestError
message= Internal error: mode yolo rejected
log:
{"method": "session/new", "params": {"cwd": "/tmp/tmp0jvbrfet", "mcpServers": []}}
{"method": "session/set_mode", "params": {"modeId": "yolo", "sessionId": "sess-1"}}
This confirms the old default can fail startup before a real prompt turn.
Step 2 — Apply the PR's changes:
Checked out the PR branch again.
Step 3 — Re-run with the fix in place:
Ran the same conversation against the same fake Gemini server:
Setting ACP session mode: default
after_run_model= acp-managed
after_run_current= auto
log:
{"method": "session/new", "params": {"cwd": "/tmp/tmph8mah1ur", "mcpServers": []}}
{"method": "session/set_mode", "params": {"modeId": "default", "sessionId": "sess-1"}}
{"method": "session/prompt", "params": {"prompt": [{"text": "Say hello through the Gemini ACP server.", "type": "text"}], "sessionId": "sess-1"}}
This confirms the changed Gemini default avoids the rejected yolo mode and allows the conversation to reach a prompt turn.
Test 3: Provider registry values and real pinned binary startup
Step 1 — Establish baseline SDK registry values on origin/main:
versions= 0.30.0 0.15.0 0.38.0
claude-code default_model= claude-opus-4-8 default_session_mode= bypassPermissions models= ['claude-fable-5', 'claude-opus-4-8', 'opus[1m]', 'claude-sonnet-4-6', 'claude-haiku-4-5', 'opusplan']
codex default_model= gpt-5.5/medium default_session_mode= full-access models= ['gpt-5.5/low', 'gpt-5.5/medium', 'gpt-5.5/high', 'gpt-5.5/xhigh', 'gpt-5.4-mini/low', 'gpt-5.4-mini/medium', 'gpt-5.4-mini/high', 'gpt-5.4-mini/xhigh']
gemini-cli default_model= auto-gemini-2.5 default_session_mode= yolo models= ['auto-gemini-3', 'auto-gemini-2.5', 'gemini-3.1-pro-preview', 'gemini-3-flash-preview', 'gemini-3.1-flash-lite-preview', 'gemini-2.5-pro', 'gemini-2.5-flash', 'gemini-2.5-flash-lite']
Step 2 — Re-run SDK registry values on the PR:
versions= 0.44.0 0.16.0 0.46.0
claude-code default_model= opus[1m] default_session_mode= bypassPermissions models= ['default', 'opus[1m]', 'sonnet', 'haiku']
codex default_model= gpt-5.5/medium default_session_mode= full-access models= ['gpt-5.5/low', 'gpt-5.5/medium', 'gpt-5.5/high', 'gpt-5.5/xhigh', 'gpt-5.4-mini/low', 'gpt-5.4-mini/medium', 'gpt-5.4-mini/high', 'gpt-5.4-mini/xhigh']
gemini-cli default_model= auto default_session_mode= default models= ['auto', 'gemini-3-pro-preview', 'gemini-3-flash-preview', 'gemini-3.1-flash-lite', 'gemini-2.5-pro', 'gemini-2.5-flash']
This confirms the public SDK registry exposes the bumped versions and reconciled default/model lists described by the PR.
Step 3 — Attempt real pinned provider CLIs:
I started real pinned npm binaries through ACPAgent with invalid credentials to avoid interactive auth:
Gemini: ACP server initialized: agent_name='gemini-cli', agent_version='0.46.0'
Gemini: Authenticating with ACP method: gemini-api-key
Gemini: Setting ACP session mode: default
Gemini prompt failed: API key not valid. Please pass a valid API key.
Claude: ACP server initialized: agent_name='@agentclientprotocol/claude-agent-acp', agent_version='0.44.0'
Claude: Setting ACP session mode: bypassPermissions
Claude prompt failed: Internal error: Invalid API key · Fix external API key
Codex: ACP server initialized: agent_name='codex-acp', agent_version='0.16.0'
Codex: Authenticating with ACP method: openai-api-key
Codex: Setting ACP session mode: full-access
Codex prompt retried on server error until the 90s QA command timeout with the invalid credential.
This confirms the SDK can launch the real bumped binaries and reach session/prompt setup without the old init-time session/set_model/yolo crashes. It does not verify real model output or file edits because valid provider credentials/subscriptions were not available.
Unable to Verify
I could not complete the PR author's full live-auth end-to-end matrix (real Claude/Codex/Gemini writing files and switching live models) because this QA environment does not include valid Anthropic/Claude, OpenAI/Codex subscription, or Gemini credentials. I attempted the real pinned CLIs anyway with invalid credentials; all reached the expected pinned binary initialization, then failed at provider authentication or generation. Future QA runs would benefit from AGENTS.md guidance naming which non-production provider credentials are available for ACP CLI smoke tests and how to opt into them safely.
Issues Found
None.
Final verdict: PARTIAL — the core SDK behavior and real binary startup paths are verified, but full authenticated provider edit/model-switch turns remain unverified here.
This QA review was generated by an AI agent (OpenHands) on behalf of the user.
Deeper e2e validation found codex-acp 0.16.0 breaks the turn *after* a model
switch: the switch applies (set_config_option, current_model_id tracks) but the
next prompt fails with JSON-RPC -32603. Confirmed apples-to-apples against the
ChatGPT-subscription backend — same SDK/auth/machine/model, only the codex
version differs:
codex 0.15.0 (set_session_model): switch gpt-5.5/high -> turn OK
codex 0.16.0 (set_config_option): switch gpt-5.5/{xhigh,high,medium},
gpt-5.4-mini/low -> turn -32603 (6+ runs)
Bumping would regress ACP model switching for codex (the #3763/#3764 path), so
hold at 0.15.0 (its set_session_model switch + turn works fully). Keep the
configOptions detection/apply code — claude-agent-acp 0.44 needs it, and it
already handles codex 0.16 if/when codex fixes the set_config_option turn path
(detection is per-session, no further change needed to un-hold). claude stays at
0.44.0, gemini at 0.46.0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🔍 Review in progress… We are performing the review through OpenHands Cloud Automation. You can log in and view the conversation here. |
…itch-turn -32603)
Root-caused the codex 0.16 "switch applies but next turn -32603s" by reading the
codex-acp Rust source (v0.15 vs v0.16): not a codex regression — a redesign.
v0.15 `set_session_model` ran `parse_model_id` (`split_once('/')`) so the
combined `gpt-5.5/medium` id was split server-side. v0.16 moved model selection
to `configOptions`: `handle_set_config_model` looks up a **bare preset id**
(`gpt-5.5`), with reasoning effort as a **separate** `reasoning_effort` option;
an unknown id (the combined `gpt-5.5/medium`) is taken literally → the backend
400s ("model is not supported"). The SDK was still sending the combined id.
Fix: when applying via the configOptions mechanism, split a `<model>/<effort>`
id (`_split_codex_model_effort`) and set `config_id="model"` (bare) then
`config_id="reasoning_effort"` (separate). Bare ids (claude `opus[1m]`/`sonnet`,
gemini `auto`) are unaffected. `_CODEX_MODELS` keeps its combined ids.
Re-validated e2e against the codex-acp 0.16.0 binary: pre-session deferred
switch, mid-conversation live switch, and resume + reapply all apply the model
AND complete the following turn (no -32603). claude 0.44 / gemini 0.46 unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review
🟢 Good taste — Elegant, well-architected solution to a multi-provider protocol migration.
Summary
This PR bumps three ACP provider CLIs (claude-agent-acp 0.30→0.44, codex-acp 0.15→0.16, gemini-cli 0.38→0.46) and adapts the SDK to a model-selection protocol change. The solution uses per-session detection to handle different CLIs running different protocols simultaneously, with a robust -32601 fallback for edge cases.
Critical Issues
None. The core logic is sound and the PR description adequately addresses the supply-chain hold (claude pinned at 0.44.0 while 0.45.0/0.46.0 age).
Release Date Verification
I verified npm registry release dates (current date: 2026-06-17):
- claude-agent-acp 0.44.0: 2026-06-09 (8 days old) ✅
- codex-acp 0.16.0: 2026-06-08 (9 days old) ✅
- gemini-cli 0.46.0: 2026-06-10 (7 days old) ✅
All versions satisfy the ≥7-day supply-chain hold. The PR correctly pins to 0.44.0 while noting 0.45.0/0.46.0 need more age.
Improvement Opportunities
- [
openhands/sdk/settings/acp_providers.py, line 399] Inline comment: The block comment explaining the version pins is thorough but lengthy. The inline notes about 0.44.0 being held are helpful but belong in the PR description/commit message. The code comment at the function level (detect_acp_provider_by_command) already explains the matching logic. Consider trimming this to focus on non-obvious invariants only.
Testing Assessment
The test coverage is exemplary:
TestModelConfigOptionDetection: Unit tests for the detection helpers withSimpleNamespacemocks covering edge cases (.rootunwrapping, non-model options, None responses).TestDetectionAgainstRealSessionResponses: Integration-style tests parsing real wire payloads (_CLAUDE_046_SESSION,_CODEX_016_SESSION,_GEMINI_046_SESSION) throughNewSessionResponse.model_validate(). This is exactly the right approach — it exercises the genuine schema including theSessionConfigOptionRootModel wrapper bug the QA bot surfaced.TestApplyAcpModelFallback: Tests the -32601 fallback path, including the critical case where non-method-not-found errors propagate without trying the fallback.TestCodexModelEffortSplit: Tests the codex-specific<model>/<effort>splitting.
All tests use real code paths (async functions with AsyncMock) and assert on actual behavior, not just mock call counts.
Architecture Observations
-
Per-session detection is the right call: The
_model_via_config_optionprivate attribute onACPAgentensures consistency within a session while supporting mixed providers in the same deployment. -
Fallback handles UNSTABLE protocol gracefully: The -32601 detection-and-retry approach is pragmatic for reading unstable capabilities.
-
Codex effort split placement: The
_split_codex_model_effort()inacp_agent.py(not in provider metadata) is correct — the splitting is an apply-time concern, not a provider metadata concern.
Risk Assessment
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This is a low-risk change:
- Protocol detection is defensive (defaults to safe fallback)
- Fallback logic prevents crashes on misdetection
- Extensive test coverage including real wire payload parsing
- Dependency versions all ≥7 days old with clear documentation of hold rationale
- PR description provides comprehensive evidence including e2e test results
Recommendation: Ready to merge. The previous blocking review (stale QA run) has been resolved with new credential-free tests and evidence attached.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Detection from the session/new response is deterministic and was correct for every validated CLI, so the cross-mechanism -32601 fallback was speculative — and the duplicate apply paths it created caused two latent bugs. Collapse init/resume/switch onto a single _apply_acp_model primitive: - remove _apply_acp_model_with_fallback / _is_method_not_found / the -32601 constant; a real rejection now surfaces (ValueError -> 400, or tolerated on resume) instead of silently flipping mechanism - fold _session_selects_model_via_config_option into _extract_session_models, which now returns (current, available, via_config_option) in one scan - replace _config_option_to_model with ACPModelInfo.from_protocol(id_attr=) - unknown/custom providers apply via the detected mechanism, not a hardcoded set_config_option guess Fixes: - resume persists acp_model_via_config_option and uses it as the detection default when load_session omits the model block, so a config-option session reapplies via the right call across pod recycle - set_acp_model's error message names the mechanism actually used - codex reasoning_effort is best-effort: a rejected effort keeps the applied model rather than failing the switch with a stale llm.model - drop false-confidence run_async stubs that never awaited the apply coroutine; drive the error path through the real conn Re-validated e2e against the pinned binaries (claude 0.44 / codex 0.16 / gemini 0.46): pre-run model, mid-conversation switch, and real turns all pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review - PR #3773
🟡 Acceptable - Solid core changes with minor code quality observations.
Summary
This PR bumps ACP provider CLI versions (claude 0.30→0.44, codex 0.15→0.16, gemini 0.38→0.46) and adapts the SDK to handle a protocol change where newer CLIs use configOptions + set_config_option instead of the models capability + set_session_model. The code correctly detects the session's advertised mechanism and applies models via the appropriate call.
Observations
[IMPROVEMENT OPPORTUNITIES]
-
[openhands-sdk/openhands/sdk/agent/acp_agent.py] Complexity: The new
_apply_acp_modelfunction handles branching logic for two mechanisms. While necessary, consider extracting the effort-handling into a separate helper to reduce the function's cognitive load. Currently it's ~30 lines; the effort try/except block could be a_apply_reasoning_effort(conn, effort, session_id)call. -
[openhands-sdk/openhands/sdk/agent/acp_agent.py] Assertion: The new
assert self._session_id is not Noneat line ~3640 adds runtime safety, but this state is already guaranteed by thehas_live_acp_sessionguard above it. Consider removing it or adding a comment explaining why it's needed beyond the guard. -
[openhands-sdk/openhands/sdk/settings/acp_providers.py] Breaking Default Change: The default model for claude changed from
claude-opus-4-8toopus[1m]. This is a user-facing behavior change — users who don't explicitly setacp_modelwill now getopus[1m]instead ofclaude-opus-4-8. The PR notes this is to match the CLI's own default, which is reasonable, but worth highlighting for users.
[STYLE NOTES]
- The docstrings in
_extract_session_modelsand_apply_acp_modelare comprehensive (~40-50 lines each). This is borderline excessive for what the code does. Consider trimming to essential invariants rather than explaining every branch.
Testing Assessment
✅ Well tested: The PR includes:
- Unit tests for all new functions (
_apply_acp_model,_split_codex_model_effort,_model_config_option) - Tests against real session response wire shapes (parsed through
NewSessionResponse) - End-to-end verification table in PR body for all three providers across init, switch, and resume paths
The test suite correctly exercises the agent-client-protocol 0.8.x RootModel wrapper edge case that was causing the production bug.
Dependency & Supply Chain
✅ Verified:
claude-agent-acp 0.44.0(2026-06-09) is past the 7-day supply-chain hold- PR notes
claude 0.45.0/0.46.0were intentionally not used due to being <7 days old - Bumped versions are all production releases, not alphas/betas
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
The changes introduce protocol detection complexity that, if wrong, could silently break ACP model selection. However:
- The detection is deterministic (based on session response shape)
- Tests exercise real wire payloads
- The graceful degradation (best-effort for unknown providers) limits blast radius
- Most CI checks are passing (core tests: sdk-tests ✅, agent-server-tests ✅)
Recommendation: The CI should fully pass before merging. Pay particular attention to the qa-changes check which is still in progress.
VERDICT:
✅ Worth merging - The protocol adaptation is well-engineered with comprehensive test coverage. Minor suggestions above are refinements, not blockers.
KEY INSIGHT:
The dual-mechanism detection pattern (checking for config_options vs models capability) is the right abstraction, but document the detection priority: models capability wins if present, configOptions is the fallback.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
codex-acp 0.16 exposes reasoning effort as a separate `reasoning_effort` configOption, so the `model` select takes bare preset ids. Mirror the server's availableModels: _CODEX_MODELS becomes (gpt-5.5, gpt-5.4, gpt-5.4-mini), default gpt-5.5; delete _split_codex_model_effort and the reasoning_effort apply call so _apply_acp_model sends the id verbatim on either mechanism. This removes the divergence where the server reported bare available_models while the SDK surfaced a combined `<model>/<effort>` current_model_id (verified e2e against the real codex-acp 0.16 CLI: available_models now matches current_model_id; pre-run and runtime switches still apply). Also tidy stale ACP comments: 0.46→0.44 version refs, the yolo→default mode doc, drop PR-process narration, and trim the _extract_session_models docstring to local behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review sweep: runtime switches are no longer "always set_session_model" (the mechanism is detected per session — set_config_option for codex/claude, else set_session_model); codex is no longer applied "via set_session_model"; the claude version note said 0.46.0 (pinned 0.44.0); stale `claude-opus-4-6` and `GPT-5.5 (xhigh)` examples. No runtime effect. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
claude-agent-acp's `model` configOption select is dynamic and account-dependent (its `/model` menu and `set_config_option` validate against the live, entitlement-gated list — verified against the 0.44 source), so a curated id is not guaranteed acceptable. Two changes so the curated lists behave as suggestions, not ground truth: - `_maybe_set_session_model` now tolerates an `ACPRequestError` for known providers too (log + fall back to the server default), matching the unknown-provider and resume paths. A seeded id the live account lacks (e.g. `sonnet`) no longer fails session creation. Runtime switches via `set_acp_model` still surface the error, since there the user explicitly asked to switch. - Add `gemini-3.1-pro-preview` to `_GEMINI_MODELS`: gemini-cli 0.46 surfaces the pro-preview as `gemini-3.1-pro-preview` once the 3.1 launch flag is on (`PREVIEW_GEMINI_3_1_MODEL`), falling back to `gemini-3-pro-preview` otherwise — keep both so the picker matches either rollout state. Also corrects the misleading "not an access check" comment on `_CLAUDE_MODELS`. Validated end-to-end against the real pinned CLIs (claude-agent-acp 0.44, codex-acp 0.16): a rejected initial model degrades to the server default; a valid one still applies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
HUMAN:
Bumping the built-in ACP provider CLIs and adapting the SDK to the
model-selection protocol change codex/claude shipped. Verified every provider
end-to-end against the real pinned binaries before bumping. Implements #3772.
AGENT:
Why
#3772asked to bump the pinned ACP provider CLIs, with a verification passfirst because "latest" is not automatically safe. The verification turned this
into a protocol change + surfaced (and root-caused) a codex switch-turn issue:
modelscapability +
session/set_modelfor amodelconfigOptionsselect drivenby
session/set_config_option. Old code crashes init. On the shippedagent-client-protocol0.8.x each config option is aSessionConfigOptionRootModel (
.root), which a naive read misses.set_session_mode("yolo")at init (-32603).<model>/<effort>id into a baremodelpreset id + a separatereasoning_effortoption (confirmed in thecodex-acp Rust source: v0.15
parse_model_iddidsplit_once('/'); v0.16handle_set_config_modellooks up a bare preset id and takes an unknown idliterally → backend 400). The SDK was sending the combined id → switch applied
but the next turn 400'd as
-32603.Summary
0.30.0 → 0.44.0, codex-acp0.15.0 → 0.16.0, gemini-cli0.38.0 → 0.46.0.set_config_option(configId="model")for configOptions servers (claude 0.44+,codex 0.16+), else
set_session_model(gemini) — across init, runtime switch,and resume. Unwraps the acp-0.8.x
SessionConfigOptionRootModel;-32601cross-mechanism fallback.
<model>/<effort>id is split into a baremodel+ a separatereasoning_effortoption (_split_codex_model_effort). Bare ids (claudeopus[1m]/sonnet, geminiauto) are unaffected;_CODEX_MODELSkeeps itscombined ids.
default_session_modeyolo → default(bridge auto-approves). claudedefault_model → opus[1m], gemini→ auto. codex models unchanged.Issue Number
#3772
How to Test
Unit:
uv run pytest tests/sdk/agent/test_acp_agent.py tests/sdk/settings/test_acp_providers.py tests/sdk/test_settings.py(green; ruff + pyright clean).Real e2e against the pinned binaries (agent-client-protocol 0.8.1, shipped):
-c openai_base_url+ connectscodex 0.16 before the split: switch to
gpt-5.5/medium→ turn400 "model not supported"(-32603). After the split (baremodel+ separatereasoning_effort): every timing applies the model AND completes the turn.Type
Notes
review bot's supply-chain hold; 0.44.0 (2026-06-09) has the same configOptions
mechanism + model set. Bump to 0.46.0 once it ages past the hold (2026-06-23).
typescript-clientmirror updated (chore(acp): mirror SDK ACP provider bump (claude 0.44, codex 0.16, gemini 0.46) typescript-client#211;drift gate goes green once this lands on
main).OpenHands/OpenHandsneeds noACP-specific change (inherits via the agent-server image / a later ts-client bump).
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:2b7673a-pythonRun
All tags pushed for this build
About Multi-Architecture Support
2b7673a-python) is a multi-arch manifest supporting both amd64 and arm642b7673a-python-amd64) are also available if needed