Skip to content

ACP: bump provider CLIs to latest releases (claude 0.30→0.46, codex 0.15→0.16, gemini 0.38→0.46) + verification plan #3772

@simonrosenberg

Description

@simonrosenberg

Summary

Bump the pinned ACP provider CLIs to their latest official npm releases, and run a structured verification pass first because "latest" is not automatically safe here — at least two of the three have known behavior changes in the target versions that directly touch the model-switching path.

Provider Pinned now Latest npm Gap Risk
@agentclientprotocol/claude-agent-acp 0.30.0 0.46.0 16 minors medium (big jump; protocol/_meta behavior may have moved)
@zed-industries/codex-acp 0.15.0 0.16.0 1 minor HIGH — set_session_model may have regressed
@google/gemini-cli 0.38.0 0.46.0 8 minors HIGH — set_session_mode("yolo") + model-list drift

The CLI version is pinned in two SDK files that must move together (acp_providers.py constants + the agent-server Dockerfile), and is mirrored downstream in typescript-client (a hard CI drift gate) and consumed by agent-canvas via that mirror. OpenHands/OpenHands inherits transitively through the agent-server image.


Gating risks (verify BEFORE bumping)

1. codex-acp 0.16.0 — does set_session_model still work? (blocker)

0.15.0 was pinned deliberately; there's a prior report that npm-latest 0.16.0 dropped set_session_model. Static inspection is inconclusive (the package is bundled — grep finds the method literal in neither 0.15.0 nor 0.16.0). If 0.16.0 truly lacks set_session_model, all ACP model switching for codex breaks (both the pre-run persist→apply path and live switching). Must be settled with a real e2e turn against 0.16.0 before bumping codex. If it regressed, hold codex at 0.15.0 (per-provider bumps are fine).

2. gemini-cli 0.46.0 — set_session_mode("yolo") (blocker)

Registry default is default_session_mode="yolo" (acp_providers.py:446), applied unconditionally at session init (acp_agent.py:~2496) with no version guard. gemini-cli 0.43.0 was observed to error on set_session_mode("yolo"), which breaks headless init. Verify 0.46.0 accepts yolo; if not, switch the default to "default" or add a guard.

3. gemini model-id drift (correctness, not crash)

The picker options in canvas/settings come from the static registry mirror, not the live session — so registry model ids must be ones the CLI actually accepts, or the picker offers selectable-but-invalid models (a set_session_model 400 on click). Confirmed drift already at gemini-cli 0.45.2: registry has gemini-3.1-pro-preview but the CLI surfaces gemini-3-pro-preview. Reconcile _GEMINI_MODELS (and likely _CODEX_MODELS/_CLAUDE_MODELS) against each bumped CLI's real availableModels.


SDK changes (required — CI/tests go red otherwise)

  1. openhands-sdk/openhands/sdk/settings/acp_providers.py:381-383 — bump CLAUDE_AGENT_ACP_VERSION / CODEX_ACP_VERSION / GEMINI_CLI_VERSION.
  2. openhands-agent-server/openhands/agent_server/docker/Dockerfile:174-176 — bump the three npm install -g pins to match. This file governs production: resolve_acp_command_prefer_pinned_binary rewrites the npx @pkg@ver command to the bare Dockerfile-installed binary (version-agnostic match), so the constant only governs the local/native-fallback path while the Dockerfile governs the container. They must stay in sync.
  3. tests/sdk/test_settings.py:886 — asserts …claude-agent-acp@0.30.0; update.
  4. tests/sdk/test_settings.py:1057-1061 — asserts …codex-acp@0.15.0; update.
  5. Reconcile model lists _CLAUDE_MODELS/_CODEX_MODELS/_GEMINI_MODELS (acp_providers.py:301-340) to the bumped CLIs' availableModels; if any default_model (:410/430/457) changes, update asserts at tests/sdk/settings/test_acp_providers.py:44,62,79.
  6. Optional: refresh version-mentioning comments/docstrings (acp_providers.py:181,198,400-405; acp_agent.py:339,542,867).

Not affected: the ACP_VERSION_CHECK CI gate tracks the Python agent-client-protocol lib (>=0.8.1), not the npm CLIs — it won't fire (unless a new CLI requires a newer protocol lib, a separate gated change). Examples use unpinned package names (no change).


Verification matrix (the round to run)

Run for each provider × each timing, against the bumped CLI (the /tmp/acp_full_matrix.py-style harness from the #3763 validation can be reused — drive LocalConversation.switch_acp_model against the real CLI):

Check claude 0.46 codex 0.16 gemini 0.46
Session creation + auth (env/file/keychain)
default_session_mode accepted (bypassPermissions/full-access/yolo) (yolo risk)
Initial model applied at session/new (current_model_id == requested)
Pre-run switch persists + first run boots on it (#3763 path)
Post-run live set_session_model honored (current_model_id tracks) (blocker)
availableModels match the registry _*_MODELS list (drift)
A real turn completes + cost/usage recorded ☐ (gemini usage often None)
Provider-specific: codex OPENAI_BASE_URL/proxy routing; gemini flash-remap reconciliation
Session resume (load_session) reapplies model
Tool-call ordering (drain session_update before prompt response — verified vs claude 0.29)

Also run the full SDK suite + a real in-container agent-server smoke (the Dockerfile path is what production uses).


Downstream coordination

  • typescript-client — REQUIRED, in lockstep. src/models/acp-providers.json is a hand-maintained mirror of ACP_PROVIDERS (includes default_command versions and model lists). The validate-acp-providers CI job (non-optional, ci.yml) runs scripts/check-acp-drift.py against SDK main and full-compares the record — so the moment the SDK bump merges, ts-client main CI goes red until the JSON is updated. Update the three default_command versions + any model-list changes, then publish a new ts-client release.
  • agent-canvas — dependency bump only. Hard-codes no versions/models; builds ACP_PROVIDERS at runtime from the pinned @openhands/typescript-client registry. Once ts-client republishes: bump package.json @openhands/typescript-client + npm install. Watch-item: ACP_VERTEX_SAFE_MODEL = "gemini-2.5-pro" (src/constants/acp-providers.ts:317) is a hard-coded Gemini default — review if the gemini bump renames/drops gemini-2.5-pro. (Note: the picker lists the static registry, not live availableModels — reinforces Add coverage report for tests #3 above.)
  • OpenHands/OpenHands — no ACP-specific change. No independent CLI pins; consumes the ghcr.io/openhands/agent-server image and gets the new CLIs when it bumps the SDK/agent-server image version (its normal flow, gated by check-version-consistency.yml).

Order of operations

  1. SDK: verify per-provider (matrix above) → bump constants + Dockerfile + tests + reconcile model lists → merge.
  2. Cut an SDK release (new agent-server image carries the bumped CLIs).
  3. typescript-client: update acp-providers.json mirror to match → publish.
  4. agent-canvas: bump @openhands/typescript-client.
  5. OpenHands: bump SDK / agent-server image version.

Recommendation

Treat the three bumps independently. claude 0.46 and gemini 0.46 are likely fine once the yolo-mode + model-list items are validated. Hold codex at 0.15.0 unless 0.16.0's set_session_model is confirmed working — losing live model switching for codex would regress the work in #3763/#3764.

Refs

Metadata

Metadata

Labels

acpAbout ACP

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions