Skip to content

Release v1.29.0#3787

Merged
enyst merged 5 commits into
mainfrom
rel-1.29.0
Jun 18, 2026
Merged

Release v1.29.0#3787
enyst merged 5 commits into
mainfrom
rel-1.29.0

Conversation

@all-hands-bot

@all-hands-bot all-hands-bot commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

HUMAN:

Cutting the v1.29.0 release. I verified the deprecation-deadline, Python/REST API breakage, and persisted-settings-compat gates plus the LLM/ACP/settings test suites pass locally before pushing.


AGENT:

Why

This is the v1.29.0 release PR. The Deprecation deadlines release gate fails once the project version reaches a feature's removed_in, so cutting 1.29.0 requires deleting every API surface scheduled for removal in 1.29.0. Two deprecated features were due:

  • The no-op _return_metrics / return_metrics parameter (deprecated in 1.24.0).
  • The acp_env field on the ACP agent/settings (deprecated in 1.24.0).

Summary

  • Bump all packages to 1.29.0 (via the Prepare Release workflow).
  • Remove _return_metrics / return_metrics from LLM.{completion,acompletion,responses,aresponses}, RouterLLM.completion, and the TestLLM doubles. Metrics are always available via LLMResponse.metrics.
  • Remove acp_env end-to-end: drop the field from ACPAgentSettings and ACPAgent (field + validators + serializers), delete ACPAgentSettings.resolve_acp_env(), simplify the ACP spawn-time env build (registry → os.environ precedence; file-secret materialisation and data-dir isolation no longer honour an acp_env pin), and drop acp_env from REDACT_ALL_VALUES_KEYS. Provide arbitrary ACP subprocess env vars via the conversation secrets channel instead.
  • Update tests, the v1 ACP persisted-settings golden fixture, the persisted-settings-compat generator, and docstrings / AGENTS.md accordingly.
  • No removed_in == 1.29.0 deprecations remain. The acp_env removal is sanctioned by both breakage gates (deprecated in the 1.28.1 baseline with removed_in reached).

Issue Number

N/A — routine release.

How to Test

From the rel-1.29.0 checkout:

uv run python .github/scripts/check_deprecations.py                                  # no overdue deadlines
uv run --with packaging python .github/scripts/check_sdk_api_breakage.py             # exit 0
uv run --with packaging python .github/scripts/check_agent_server_rest_api_breakage.py  # exit 0
uv run python .github/scripts/check_persisted_settings_compat.py                     # exit 0
uv run pytest tests/sdk tests/agent_server --ignore=tests/agent_server/stress -q     # green

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Notes

Downstream consumers (OpenHands, agent-canvas, typescript-client) that still reference acp_env will be updated separately; ACP acp_env storage was never used in production.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:579ad28-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-579ad28-python \
  ghcr.io/openhands/agent-server:579ad28-python

All tags pushed for this build

ghcr.io/openhands/agent-server:579ad28-golang-amd64
ghcr.io/openhands/agent-server:579ad28ee95f84fda5af1579a79cdd653b3b016d-golang-amd64
ghcr.io/openhands/agent-server:rel-1.29.0-golang-amd64
ghcr.io/openhands/agent-server:579ad28-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:579ad28-golang-arm64
ghcr.io/openhands/agent-server:579ad28ee95f84fda5af1579a79cdd653b3b016d-golang-arm64
ghcr.io/openhands/agent-server:rel-1.29.0-golang-arm64
ghcr.io/openhands/agent-server:579ad28-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:579ad28-java-amd64
ghcr.io/openhands/agent-server:579ad28ee95f84fda5af1579a79cdd653b3b016d-java-amd64
ghcr.io/openhands/agent-server:rel-1.29.0-java-amd64
ghcr.io/openhands/agent-server:579ad28-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:579ad28-java-arm64
ghcr.io/openhands/agent-server:579ad28ee95f84fda5af1579a79cdd653b3b016d-java-arm64
ghcr.io/openhands/agent-server:rel-1.29.0-java-arm64
ghcr.io/openhands/agent-server:579ad28-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:579ad28-python-amd64
ghcr.io/openhands/agent-server:579ad28ee95f84fda5af1579a79cdd653b3b016d-python-amd64
ghcr.io/openhands/agent-server:rel-1.29.0-python-amd64
ghcr.io/openhands/agent-server:579ad28-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:579ad28-python-arm64
ghcr.io/openhands/agent-server:579ad28ee95f84fda5af1579a79cdd653b3b016d-python-arm64
ghcr.io/openhands/agent-server:rel-1.29.0-python-arm64
ghcr.io/openhands/agent-server:579ad28-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:579ad28-golang
ghcr.io/openhands/agent-server:579ad28ee95f84fda5af1579a79cdd653b3b016d-golang
ghcr.io/openhands/agent-server:rel-1.29.0-golang
ghcr.io/openhands/agent-server:579ad28-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:579ad28-java
ghcr.io/openhands/agent-server:579ad28ee95f84fda5af1579a79cdd653b3b016d-java
ghcr.io/openhands/agent-server:rel-1.29.0-java
ghcr.io/openhands/agent-server:579ad28-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:579ad28-python
ghcr.io/openhands/agent-server:579ad28ee95f84fda5af1579a79cdd653b3b016d-python
ghcr.io/openhands/agent-server:rel-1.29.0-python
ghcr.io/openhands/agent-server:579ad28-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 579ad28-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 579ad28-python-amd64) are also available if needed

@all-hands-bot all-hands-bot added integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. behavior-test labels Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly.

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly.

@github-actions

Copy link
Copy Markdown
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Release-version behavior works in the installed packages, agent-server endpoint, workflow default, and built artifacts, but the PR is not release-ready because the required deprecation-deadline CI gate is failing for v1.29.0.

Does this PR achieve its stated goal?

Partially. The PR successfully moves the user-visible package/runtime version surfaces from 1.28.0 to 1.29.0: editable installs report 1.29.0, the SDK banner imports cleanly, /server_info reports all OpenHands component versions as 1.29.0, run-eval defaults to v1.29.0, and uv build --all-packages creates 1.29.0 artifacts. However, the PR description’s release checklist includes fixing deprecation deadlines, and CI currently fails that gate with multiple APIs whose removal target is 1.29.0, so this PR does not fully prepare a mergeable release yet.

Phase Result
Environment Setup make build completed successfully and installed the editable 1.29.0 packages.
CI Status ⚠️ Refreshed status: 1 failing (Deprecation deadlines/check), 18 successful, 16 pending, 14 skipped; I did not rerun CI.
Functional Verification ✅ Versioned package/runtime behavior works; release readiness has one blocking CI issue.
Functional Verification

Test 1: Installed package metadata and SDK import banner

Step 1 — Establish baseline on origin/main:
Ran cd /tmp/qa-sdk-main && uv sync --dev && uv run python - <<'PY' ...:

openhands-sdk=1.28.0
openhands-tools=1.28.0
openhands-workspace=1.28.0
openhands-agent-server=1.28.0
OpenHands SDK v1.28.0
sdk_imports=ok Agent LLM Tool

This shows the currently released branch exposes 1.28.0 through installed distribution metadata and the SDK runtime banner.

Step 2 — Apply the PR's changes:
Used the PR checkout at commit a3acac5b5b3f889f976f1759ac0a6915d2351655 on rel-1.29.0 and ran the same import/metadata check after make build.

Step 3 — Re-run with the PR in place:

openhands-sdk=1.29.0
openhands-tools=1.29.0
openhands-workspace=1.29.0
openhands-agent-server=1.29.0
OpenHands SDK v1.29.0
sdk_imports=ok Agent LLM Tool

This confirms a real SDK user importing the package sees 1.29.0 consistently across all four distributions.

Test 2: Agent server /server_info runtime versions

Step 1 — Establish baseline on origin/main:
Ran uv run python -m openhands.agent_server --host 127.0.0.1 --port 18081 and queried curl http://127.0.0.1:18081/server_info:

{
  "version": "1.28.0",
  "sdk_version": "1.28.0",
  "tools_version": "1.28.0",
  "workspace_version": "1.28.0"
}

This confirms the old server runtime reports 1.28.0 to API clients.

Step 2 — Apply the PR's changes:
Started the PR checkout’s server on a separate local port.

Step 3 — Re-run with the PR in place:
Ran uv run python -m openhands.agent_server --host 127.0.0.1 --port 18082 and queried curl http://127.0.0.1:18082/server_info:

{
  "version": "1.29.0",
  "sdk_version": "1.29.0",
  "tools_version": "1.29.0",
  "workspace_version": "1.29.0"
}

This confirms a real API client sees the agent-server and component versions as 1.29.0.

Test 3: run-eval workflow dispatch default

Step 1 — Establish baseline on origin/main:
Ran a YAML parse/extract check for .github/workflows/run-eval.yml:

workflow_yaml_parse=ok
run_eval_sdk_ref_default=v1.28.0

This shows the previous dispatch default pointed at the old release tag.

Step 2 — Apply the PR's changes:
Repeated the same YAML parse/extract check on the PR checkout.

Step 3 — Re-run with the PR in place:

workflow_yaml_parse=ok
run_eval_sdk_ref_default=v1.29.0

This confirms the workflow remains parseable and now defaults evaluations to v1.29.0.

Test 4: Release artifact build

Step 1 — Establish baseline:
The metadata checks above establish the pre-PR package version as 1.28.0.

Step 2 — Apply the PR's changes:
Built the PR artifacts with uv build --all-packages.

Step 3 — Verify built artifacts:

Successfully built dist/openhands_agent_server-1.29.0.tar.gz
Successfully built dist/openhands_agent_server-1.29.0-py3-none-any.whl
Successfully built dist/openhands_sdk-1.29.0.tar.gz
Successfully built dist/openhands_sdk-1.29.0-py3-none-any.whl
Successfully built dist/openhands_tools-1.29.0.tar.gz
Successfully built dist/openhands_tools-1.29.0-py3-none-any.whl
Successfully built dist/openhands_workspace-1.29.0.tar.gz
Successfully built dist/openhands_workspace-1.29.0-py3-none-any.whl

This confirms the release packaging path produces 1.29.0 wheels and sdists for all four packages.

CI evidence for the release-readiness issue

Fetched the failed Deprecation deadlines/check log with gh run view 27760514092 --repo OpenHands/software-agent-sdk --log-failed:

The following deprecated features have passed their removal deadline:

- [openhands-sdk] 'ACPAgentSettings.acp_env' (warn_call)
  deprecated in: 1.24.0
  removed in:    1.29.0

- [openhands-sdk] 'ACPAgent.acp_env' (warn_call)
  deprecated in: 1.24.0
  removed in:    1.29.0

- [openhands-sdk] 'LLM.completion(_return_metrics=...)' (warn_call)
  deprecated in: 1.24.0
  removed in:    1.29.0

- [openhands-sdk] 'LLM.acompletion(_return_metrics=...)' (warn_call)
  deprecated in: 1.24.0
  removed in:    1.29.0

- [openhands-sdk] 'LLM.responses(_return_metrics=...)' (warn_call)
  deprecated in: 1.24.0
  removed in:    1.29.0

- [openhands-sdk] 'LLM.aresponses(_return_metrics=...)' (warn_call)
  deprecated in: 1.24.0
  removed in:    1.29.0

- [openhands-sdk] 'RouterLLM.completion(return_metrics=...)' (warn_call)
  deprecated in: 1.24.0
  removed in:    1.29.0

Update or remove the listed features before publishing a version that meets or exceeds their removal deadline.

Issues Found

  • 🟠 Issue: The release metadata/runtime behavior verifies correctly, but Deprecation deadlines/check is failing because several APIs have removed in: 1.29.0. This blocks the PR from fully achieving “prepare the release for version 1.29.0” until those deadlines are fixed or retargeted.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Comment thread openhands-sdk/pyproject.toml
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   api.py2762690%109, 111–116, 118, 120, 122, 157, 169, 184, 190, 243, 248, 257–259, 503, 506, 510–512, 514, 521
   settings_router.py125893%262, 264–265, 366, 368–369, 407, 412
openhands-agent-server/openhands/agent_server/persistence
   models.py1992686%281, 286, 324, 372, 407–413, 415, 417, 420, 424, 450, 467–468, 513, 517, 519, 548–551, 554
openhands-sdk/openhands/sdk/agent
   acp_agent.py123310291%553, 686, 901–903, 946–947, 1249–1250, 1293, 1295, 1299, 1303, 1329, 1392–1393, 1398, 1465, 1756, 1759–1760, 1777–1778, 1814, 1819, 1927, 1932, 2506–2509, 2513–2515, 2518–2522, 2524, 2772, 2786–2787, 2790–2792, 2800, 2804, 2808–2809, 2815–2816, 2828–2829, 2832, 2880, 2884–2886, 2890–2891, 2923, 3007, 3194–3196, 3199–3200, 3240, 3386, 3394–3396, 3434–3435, 3438, 3446–3448, 3450, 3452, 3456, 3459, 3468–3470, 3472, 3508–3509, 3527–3530, 3533, 3537–3539, 3541, 3545–3546, 3776–3777
openhands-sdk/openhands/sdk/context
   agent_context.py162696%337, 409–410, 478, 501, 507
openhands-sdk/openhands/sdk/llm
   llm.py94311687%555, 571, 610–611, 616, 702, 718, 885, 919–920, 923–927, 929, 937–939, 943, 960–961, 965, 967–968, 970–972, 1102, 1225, 1408–1410, 1510, 1551, 1563–1565, 1568–1571, 1577, 1638, 1686, 1699–1701, 1704–1707, 1713, 1810, 1812, 1814, 1840, 1842, 1851–1852, 1902, 1965–1970, 2040, 2182–2183, 2524–2525, 2534, 2552, 2579–2580, 2582, 2584, 2586, 2594, 2597, 2599, 2601, 2612–2613, 2621, 2624, 2627–2628, 2639–2641, 2645, 2649–2650, 2655, 2665, 2670, 2734, 2736, 2738–2741, 2743–2746, 2751–2754, 2769, 2780, 2837, 2839
openhands-sdk/openhands/sdk/llm/router
   base.py42783%45, 74–75, 77, 80, 112, 119
openhands-sdk/openhands/sdk/settings
   model.py6914893%101, 400, 418, 598, 608–611, 614, 627, 631, 637, 647, 653, 658, 881, 906, 908, 910, 912, 914, 916, 918, 920, 922, 1187, 1189, 1533, 1553, 1714, 1843, 1882, 1908, 2044–2046, 2048, 2102, 2134, 2144, 2146, 2151, 2169, 2182, 2184, 2186, 2188, 2195
openhands-sdk/openhands/sdk/testing
   test_llm.py67494%180, 190, 250, 324
openhands-sdk/openhands/sdk/utils
   redact.py881484%87, 226–227, 250–256, 272–275
TOTAL33437681479% 

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-06-18 13:21:52 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 24.1s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 23.0s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 8.8s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 35.1s $0.03
01_standalone_sdk/09_pause_example.py ✅ PASS 10.9s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 23.9s $0.02
01_standalone_sdk/11_async.py ✅ PASS 31.9s $0.03
01_standalone_sdk/12_custom_secrets.py ✅ PASS 13.3s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 36.9s $0.05
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 17s $0.16
01_standalone_sdk/17_image_input.py ✅ PASS 19.9s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 25.8s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 16.3s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 12.7s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 9.8s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 13.0s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 6s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 4m 51s $0.34
01_standalone_sdk/25_agent_delegation.py ✅ PASS 1m 3s $0.07
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 17.8s $0.03
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 36.7s $0.04
01_standalone_sdk/29_llm_streaming.py ✅ PASS 38.7s $0.02
01_standalone_sdk/30_tom_agent.py ✅ PASS 8.9s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 4m 57s $0.35
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 24.9s $0.02
01_standalone_sdk/33_hooks/main.py ✅ PASS 39.0s $0.04
01_standalone_sdk/34_critic_example.py ✅ PASS 9m 16s $0.74
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 13.0s $0.01
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 9.5s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 42.3s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 9.7s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 34.5s $0.31
01_standalone_sdk/41_task_tool_set.py ✅ PASS 32.4s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 52.3s $0.05
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 3.5s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 9.0s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 8m 4s $0.62
01_standalone_sdk/46_agent_settings.py ✅ PASS 9.7s $0.00
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 3.3s $0.00
01_standalone_sdk/48_conversation_fork.py ✅ PASS 13.9s $0.00
01_standalone_sdk/49_switch_llm_tool.py ✅ PASS 8.4s $0.03
01_standalone_sdk/50_async_cancellation.py ✅ PASS 12.3s $0.00
01_standalone_sdk/51_agent_hooks/main.py ✅ PASS 44.2s $0.06
01_standalone_sdk/52_dynamic_workflow.py ✅ PASS 4m 15s $0.15
01_standalone_sdk/53_client_defined_tools.py ✅ PASS 11.1s $0.01
01_standalone_sdk/54_goal_completion_loop.py ✅ PASS 27.4s $0.03
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 37.8s $0.02
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 47s $0.05
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 1m 25s $0.10
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 21s $0.04
02_remote_agent_server/06_custom_tool/main.py ✅ PASS 5m 42s $0.06
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 38.4s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 4m 10s $0.03
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 1m 2s $0.32
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 40.3s $0.06
02_remote_agent_server/11_conversation_fork.py ✅ PASS 51.3s $0.00
02_remote_agent_server/12_settings_and_secrets_api.py ✅ PASS 2m 51s $0.04
02_remote_agent_server/13_workspace_get_llm.py ✅ PASS 50.1s $0.03
02_remote_agent_server/14_client_defined_tools.py ✅ PASS 1m 4s $0.04
02_remote_agent_server/15_openai_compatible_gateway.py ✅ PASS 17.7s $0.01
02_remote_agent_server/16_deferred_init.py ❌ FAIL
Exit code 1
3m 9s --
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 28.8s $0.02
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 1m 51s $0.06
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 12.7s $0.02
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 14.9s $0.02

❌ Some tests failed

Total: 65 | Passed: 64 | Failed: 1 | Total Cost: $4.45

Failed examples:

  • examples/02_remote_agent_server/16_deferred_init.py: Exit code 1

View full workflow run

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-06-18 13:23:08 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 34.6s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 21.4s $0.02
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 9.7s $0.00
01_standalone_sdk/07_mcp_integration.py ✅ PASS 33.5s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 11.2s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 26.2s $0.02
01_standalone_sdk/11_async.py ✅ PASS 31.7s $0.03
01_standalone_sdk/12_custom_secrets.py ✅ PASS 14.3s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 30.5s $0.04
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 8s $0.13
01_standalone_sdk/17_image_input.py ✅ PASS 26.5s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 18.4s $0.01
01_standalone_sdk/19_llm_routing.py ✅ PASS 21.6s $0.01
01_standalone_sdk/20_stuck_detector.py ✅ PASS 13.5s $0.01
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 10.7s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 14.8s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 38.9s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 6m 14s $0.47
01_standalone_sdk/25_agent_delegation.py ✅ PASS 1m 13s $0.06
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 20.0s $0.03
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 41.6s $0.04
01_standalone_sdk/29_llm_streaming.py ✅ PASS 43.3s $0.03
01_standalone_sdk/30_tom_agent.py ✅ PASS 11.1s $0.00
01_standalone_sdk/31_iterative_refinement.py ❌ FAIL
Timed out after 600 seconds
10m 0s --
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 20.6s $0.01
01_standalone_sdk/33_hooks/main.py ✅ PASS 40.7s $0.04
01_standalone_sdk/34_critic_example.py ✅ PASS 3m 29s $0.18
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 10.7s $0.00
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 5.8s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 40.8s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 14.3s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 32.6s $0.31
01_standalone_sdk/41_task_tool_set.py ✅ PASS 32.0s $0.02
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 36.3s $0.04
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 7.4s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 7.8s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 7m 39s $0.65
01_standalone_sdk/46_agent_settings.py ✅ PASS 10.1s $0.01
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 3.3s $0.00
01_standalone_sdk/48_conversation_fork.py ✅ PASS 14.1s $0.00
01_standalone_sdk/49_switch_llm_tool.py ✅ PASS 7.2s $0.03
01_standalone_sdk/50_async_cancellation.py ✅ PASS 12.7s $0.00
01_standalone_sdk/51_agent_hooks/main.py ✅ PASS 54.7s $0.06
01_standalone_sdk/52_dynamic_workflow.py ✅ PASS 6m 57s $0.22
01_standalone_sdk/53_client_defined_tools.py ✅ PASS 13.6s $0.01
01_standalone_sdk/54_goal_completion_loop.py ✅ PASS 54.0s $0.04
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 38.5s $0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 44s $0.04
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 1m 1s --
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 45s $0.03
02_remote_agent_server/06_custom_tool/main.py ✅ PASS 5m 25s $0.04
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 53.7s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 3m 55s $0.02
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 1m 10s $0.18
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 36.4s $0.03
02_remote_agent_server/11_conversation_fork.py ✅ PASS 56.9s $0.00
02_remote_agent_server/12_settings_and_secrets_api.py ✅ PASS 2m 23s $0.02
02_remote_agent_server/13_workspace_get_llm.py ✅ PASS 41.0s $0.01
02_remote_agent_server/14_client_defined_tools.py ✅ PASS 42.9s $0.02
02_remote_agent_server/15_openai_compatible_gateway.py ✅ PASS 13.9s $0.00
02_remote_agent_server/16_deferred_init.py ❌ FAIL
Exit code 1
2m 30s --
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 37.0s $0.03
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 1m 43s $0.07
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 16.2s $0.01
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 15.3s $0.02

❌ Some tests failed

Total: 65 | Passed: 63 | Failed: 2 | Total Cost: $3.25

Failed examples:

  • examples/01_standalone_sdk/31_iterative_refinement.py: Timed out after 600 seconds
  • examples/02_remote_agent_server/16_deferred_init.py: Exit code 1

View full workflow run

@github-actions

This comment was marked as resolved.

github-actions Bot and others added 2 commits June 18, 2026 15:41
Co-authored-by: openhands <openhands@all-hands.dev>
Both features reached their scheduled removal version in 1.29.0:

- Drop the no-op _return_metrics/return_metrics parameter from
  LLM.{completion,acompletion,responses,aresponses}, RouterLLM.completion,
  and the TestLLM doubles. Metrics are always on LLMResponse.metrics.
- Remove the acp_env field end-to-end: ACPAgentSettings and ACPAgent
  (field/validators/serializers), ACPAgentSettings.resolve_acp_env(), the
  ACP spawn-time env-injection/precedence logic, and the
  REDACT_ALL_VALUES_KEYS entry. Arbitrary ACP subprocess env vars now ride
  the conversation secrets channel.

Updates tests, the v1 ACP persisted-settings golden fixture, the
persisted-settings-compat generator, and docstrings/AGENTS.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

enyst commented Jun 18, 2026

Copy link
Copy Markdown
Member

@enyst @simonrosenberg @VascoSch92 — manual REST API contract summary for the PRs discussed.

This is the same kind of concise public /api/** OpenAPI contract diff that PR #3789 is adding to PR descriptions. Issue #3790 tracks redesigning that automation as a safe two-stage workflow so fork PR descriptions can be updated too.

This comment was created by an AI agent (OpenHands) on behalf of the user.

Status at a glance

#3784 — agent profile at conversation start

--- PR #3784 base public OpenAPI
+++ PR #3784 head public OpenAPI
@@ -978,0 +979 @@
+schema ConversationInfo property launched_profile optional schema=anyOf=[LaunchedProfile,type="null"]
@@ -1469,0 +1471,3 @@
+schema LaunchedProfile property profile_id required schema=type="string" format="uuid"
+schema LaunchedProfile property revision required schema=type="integer" minimum=0.0
+schema LaunchedProfile type="object"
@@ -1863,0 +1868 @@
+schema StartConversationRequest property agent_profile_id optional schema=anyOf=[type="string" format="uuid",type="null"]

#3788 — clarify launched agent profile provenance names

--- PR #3788 base public OpenAPI
+++ PR #3788 head public OpenAPI
@@ -979 +979 @@
-schema ConversationInfo property launched_profile optional schema=anyOf=[LaunchedProfile,type="null"]
+schema ConversationInfo property launched_agent_profile optional schema=anyOf=[LaunchedAgentProfile,type="null"]
@@ -1471,3 +1471,3 @@
-schema LaunchedProfile property profile_id required schema=type="string" format="uuid"
-schema LaunchedProfile property revision required schema=type="integer" minimum=0.0
-schema LaunchedProfile type="object"
+schema LaunchedAgentProfile property agent_profile_id required schema=type="string" format="uuid"
+schema LaunchedAgentProfile property revision required schema=type="integer" minimum=0.0
+schema LaunchedAgentProfile type="object"

#3770 — goal conversation endpoints

--- PR #3770 base public OpenAPI
+++ PR #3770 head public OpenAPI
@@ -69,0 +70,3 @@
+operation POST /api/conversations/{conversation_id}/goal operationId=start_goal_conversation_api_conversations__conversation_id__goal_post
+operation POST /api/conversations/{conversation_id}/goal/resume operationId=resume_goal_conversation_api_conversations__conversation_id__goal_resume_post
+operation POST /api/conversations/{conversation_id}/goal/stop operationId=stop_goal_conversation_api_conversations__conversation_id__goal_stop_post
@@ -173,0 +177,3 @@
+parameter POST /api/conversations/{conversation_id}/goal path:conversation_id required=true schema=type="string" format="uuid"
+parameter POST /api/conversations/{conversation_id}/goal/resume path:conversation_id required=true schema=type="string" format="uuid"
+parameter POST /api/conversations/{conversation_id}/goal/stop path:conversation_id required=true schema=type="string" format="uuid"
@@ -201,0 +208 @@
+requestBody POST /api/conversations/{conversation_id}/goal application/json required=true schema=StartGoalRequest
@@ -356,0 +364,11 @@
+response POST /api/conversations/{conversation_id}/goal 200 application/json schema=Success
+response POST /api/conversations/{conversation_id}/goal 404 no-content
+response POST /api/conversations/{conversation_id}/goal 409 no-content
+response POST /api/conversations/{conversation_id}/goal 422 application/json schema=HTTPValidationError
+response POST /api/conversations/{conversation_id}/goal/resume 200 application/json schema=Success
+response POST /api/conversations/{conversation_id}/goal/resume 404 no-content
+response POST /api/conversations/{conversation_id}/goal/resume 409 no-content
+response POST /api/conversations/{conversation_id}/goal/resume 422 application/json schema=HTTPValidationError
+response POST /api/conversations/{conversation_id}/goal/stop 200 application/json schema=Success
+response POST /api/conversations/{conversation_id}/goal/stop 404 no-content
+response POST /api/conversations/{conversation_id}/goal/stop 422 application/json schema=HTTPValidationError
@@ -1890,0 +1909,3 @@
+schema StartGoalRequest property max_iterations optional schema=type="integer" default=10 minimum=1.0
+schema StartGoalRequest property objective required schema=type="string"
+schema StartGoalRequest type="object"

#3787 current release PR diff vs latest main

--- PR #3787 base public OpenAPI
+++ PR #3787 head public OpenAPI
@@ -421 +420,0 @@
-schema ACPAgent-Input property acp_env optional schema=type="object" additionalProperties=type="string"
@@ -446 +444,0 @@
-schema ACPAgent-Output property acp_env optional schema=type="object" additionalProperties=type="string"
@@ -979 +977 @@
-schema ConversationInfo property launched_agent_profile optional schema=anyOf=[LaunchedAgentProfile,type="null"]
+schema ConversationInfo property launched_profile optional schema=anyOf=[LaunchedProfile,type="null"]
@@ -1471,3 +1469,3 @@
-schema LaunchedAgentProfile property agent_profile_id required schema=type="string" format="uuid"
-schema LaunchedAgentProfile property revision required schema=type="integer" minimum=0.0
-schema LaunchedAgentProfile type="object"
+schema LaunchedProfile property profile_id required schema=type="string" format="uuid"
+schema LaunchedProfile property revision required schema=type="integer" minimum=0.0
+schema LaunchedProfile type="object"

@enyst

enyst commented Jun 18, 2026

Copy link
Copy Markdown
Member

Can we also pick d7392de , though please note that my agent is looking into something failing on github actions?

@github-actions

This comment was marked as outdated.

@enyst enyst added test-examples Run all applicable "examples/" files. Expensive operation. integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. labels Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 97.1%
Total Cost: $1.52
Models Tested: 4
Timestamp: 2026-06-18 14:57:14 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_minimax_MiniMax_M2.7 100.0% 8/8 1 9 $0.00 304,554
litellm_proxy_openai_gpt_5.5 100.0% 9/9 0 9 $0.94 263,057
litellm_proxy_gemini_3.1_pro_preview 88.9% 8/9 0 9 $0.56 325,638
litellm_proxy_deepseek_deepseek_v4_flash 100.0% 8/8 1 9 $0.02 415,253

📋 Detailed Results

litellm_proxy_minimax_MiniMax_M2.7

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.00
  • Token Usage: prompt: 300,618, completion: 3,936, cache_read: 222,479, reasoning: 817
  • Run Suffix: litellm_proxy_minimax_MiniMax_M2.7_884c996_minimax_m2_7_run_N9_20260618_145313
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_openai_gpt_5.5

  • Success Rate: 100.0% (9/9)
  • Total Cost: $0.94
  • Token Usage: prompt: 258,788, completion: 4,269, cache_read: 107,520, reasoning: 1,313
  • Run Suffix: litellm_proxy_openai_gpt_5.5_884c996_gpt_5_5_run_N9_20260618_145408

litellm_proxy_gemini_3.1_pro_preview

  • Success Rate: 88.9% (8/9)
  • Total Cost: $0.56
  • Token Usage: prompt: 321,455, completion: 4,183, cache_read: 73,476, reasoning: 2,580
  • Run Suffix: litellm_proxy_gemini_3.1_pro_preview_884c996_gemini_3_1_pro_run_N9_20260618_145331

Failed Tests:

  • t08_image_file_viewing: Agent did not identify yellow color in the logo. Response: . (Cost: $0.03)

litellm_proxy_deepseek_deepseek_v4_flash

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.02
  • Token Usage: prompt: 410,303, completion: 4,950, cache_read: 228,224, reasoning: 1,631
  • Run Suffix: litellm_proxy_deepseek_deepseek_v4_flash_884c996_deepseek_v4_flash_run_N9_20260618_145322
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-06-18 15:19:40 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 24.3s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 20.7s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 9.5s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 42.2s $0.03
01_standalone_sdk/09_pause_example.py ✅ PASS 14.1s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 24.9s $0.02
01_standalone_sdk/11_async.py ✅ PASS 33.9s $0.04
01_standalone_sdk/12_custom_secrets.py ✅ PASS 12.3s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 24.9s $0.03
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 15s $0.15
01_standalone_sdk/17_image_input.py ✅ PASS 22.4s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 29.5s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 18.2s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 15.3s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 9.6s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 20.8s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 29s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 3m 18s $0.25
01_standalone_sdk/25_agent_delegation.py ✅ PASS 1m 12s $0.08
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 17.3s $0.02
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 40.8s $0.04
01_standalone_sdk/29_llm_streaming.py ✅ PASS 47.4s $0.03
01_standalone_sdk/30_tom_agent.py ✅ PASS 9.0s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 5m 39s $0.40
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 17.5s $0.02
01_standalone_sdk/33_hooks/main.py ✅ PASS 39.2s $0.04
01_standalone_sdk/34_critic_example.py ✅ PASS 8m 11s $0.87
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 10.7s $0.00
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 5.7s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 37.7s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 10.1s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 38.4s $0.30
01_standalone_sdk/41_task_tool_set.py ✅ PASS 27.6s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 32.3s $0.04
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 3.7s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 8.9s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 4m 5s $0.30
01_standalone_sdk/46_agent_settings.py ✅ PASS 10.7s $0.01
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 3.4s $0.00
01_standalone_sdk/48_conversation_fork.py ✅ PASS 11.9s $0.00
01_standalone_sdk/49_switch_llm_tool.py ✅ PASS 7.9s $0.03
01_standalone_sdk/50_async_cancellation.py ✅ PASS 12.8s $0.00
01_standalone_sdk/51_agent_hooks/main.py ✅ PASS 41.0s $0.05
01_standalone_sdk/52_dynamic_workflow.py ✅ PASS 6m 38s $0.17
01_standalone_sdk/53_client_defined_tools.py ✅ PASS 13.5s $0.01
01_standalone_sdk/54_goal_completion_loop.py ✅ PASS 31.3s $0.03
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 36.5s $0.02
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 39s $0.04
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 56.4s $0.05
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 40s $0.03
02_remote_agent_server/06_custom_tool/main.py ✅ PASS 5m 4s $0.04
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 36.1s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 4m 14s $0.03
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 1m 6s $0.35
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 45.2s $0.06
02_remote_agent_server/11_conversation_fork.py ✅ PASS 51.3s $0.00
02_remote_agent_server/12_settings_and_secrets_api.py ✅ PASS 2m 50s $0.04
02_remote_agent_server/13_workspace_get_llm.py ✅ PASS 34.8s $0.02
02_remote_agent_server/14_client_defined_tools.py ✅ PASS 1m 7s $0.04
02_remote_agent_server/15_openai_compatible_gateway.py ✅ PASS 17.9s $0.01
02_remote_agent_server/16_deferred_init.py ❌ FAIL
Exit code 1
2m 27s --
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 21.3s $0.02
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 1m 6s $0.09
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 14.0s $0.02
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 17.4s $0.02

❌ Some tests failed

Total: 65 | Passed: 64 | Failed: 1 | Total Cost: $4.19

Failed examples:

  • examples/02_remote_agent_server/16_deferred_init.py: Exit code 1

View full workflow run

@enyst

enyst commented Jun 18, 2026

Copy link
Copy Markdown
Member

@OpenHands look at the last comment on running test examples workflow and how it works. run the failed example yourself now, same llm, you can use eval proxy and eval key though. does it work now? on latest version of the branch. then /codereview this release pr. reply on github.

@openhands-development

Copy link
Copy Markdown

@enyst it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again.

@openhands-ai

openhands-ai Bot commented Jun 18, 2026

Copy link
Copy Markdown

I'm on it! enyst can track my progress at all-hands.dev

@enyst enyst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enyst I checked the latest Run Examples comment and reproduced the failed example locally on the latest branch head (b86ba42cab5150c0a979f35b0d63486dc0f3da19).

Local reproduction command (same workflow model, eval proxy/key):

env -u TMUX_TMPDIR -u TMUX -u SESSION_API_KEY -u OH_SESSION_API_KEYS_0 \
  LLM_API_KEY="$LITELLM_API_KEY" \
  LLM_MODEL="openhands/claude-haiku-4-5-20251001" \
  LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" \
  ANTHROPIC_BASE_URL="https://llm-proxy.eval.all-hands.dev" \
  ANTHROPIC_API_KEY="$LITELLM_API_KEY" \
  OPENAI_BASE_URL="https://llm-proxy.eval.all-hands.dev" \
  OPENAI_API_KEY="$LITELLM_API_KEY" \
  uv run pytest tests/examples/test_examples.py \
    --run-examples \
    --examples-results-dir .example-test-results-local \
    -k '16_deferred_init' -s

Result: still fails. The deferred-init flow itself reaches ready state and runs the conversation, but the example times out its polling loop because the server reports execution_status: "finished" and the example only breaks/accepts ("stopped", "paused", "error") / asserts ("stopped", "paused"):

AssertionError: Unexpected final status: finished

That matches the workflow log for Run Examples Scripts run 27768040721, which also failed with Unexpected final status: finished.

/codereview

🟡 Taste Rating: Acceptable - The release cleanup is mostly straightforward and removes deadline-expired compatibility code, but the PR is not release-ready because required release validation is failing/stale.

[CRITICAL ISSUES]

  • None found in the release cleanup itself during this pass.

[TESTING GAPS]

  • [PR validation] Release workflow validation is not green/current: per this repo's release review policy, release PRs need current PR-specific validation for Run tests, Run Examples Scripts, and Run Integration Tests before approval. Current state:
    • Run tests: ✅ current on b86ba42cab5150c0a979f35b0d63486dc0f3da19.
    • Run Examples Scripts: ❌ latest PR comment/run fails examples/02_remote_agent_server/16_deferred_init.py; I reproduced the same failure locally on current HEAD with the requested model/proxy.
    • Run Integration Tests: ambiguous/stale for current HEAD. Latest visible result comment is for run 27768049468 on older head 884c996f9ad515c66a7c9b9440ea9cf0a17938e6 and reports 97.1% with a Gemini t08_image_file_viewing failure, even though the GitHub workflow conclusion is success. I do not see a current green integration result for b86ba42 in the PR comments.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM
    This is an intentional breaking release PR that removes deprecated public API surface (_return_metrics/return_metrics, acp_env). The code direction is reasonable for a release deadline cleanup, and the breakage gates are reported green, but failed/stale release workflow evidence makes auto-approval unsafe.

VERDICT:
Needs rework / human maintainer decision before merge: fix or intentionally waive the failing 16_deferred_init.py example, then rerun Run Examples Scripts; also provide a current unambiguous Run Integration Tests result for the latest head.

KEY INSIGHT:
The blocking issue is not the release cleanup mechanics; it is that the release validation evidence does not currently prove the latest branch is shippable.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

This PR review was created by an AI agent (OpenHands) on behalf of the requester.

@openhands-ai

This comment has been minimized.

…3793)

(cherry picked from commit 9e29079)

Co-authored-by: openhands <openhands@all-hands.dev>

enyst commented Jun 18, 2026

Copy link
Copy Markdown
Member

@enyst confirmed — I cherry-picked the squashed main commit for #3793 onto rel-1.29.0, and nothing else.

Details:

  • Source commit from main: 9e290793addcf7397bce315b8c85803a8b70f804
  • New release-branch commit: 579ad28ee95f84fda5af1579a79cdd653b3b016d
  • Only changed file:
    • examples/02_remote_agent_server/16_deferred_init.py

Verification run after cherry-pick:

uv run pre-commit run --files examples/02_remote_agent_server/16_deferred_init.py

Passed.

I also reran the previously failing example with the same workflow model and eval proxy/key:

env -u TMUX_TMPDIR -u TMUX -u SESSION_API_KEY -u OH_SESSION_API_KEYS_0 \
  LLM_API_KEY="$LITELLM_API_KEY" \
  LLM_MODEL="openhands/claude-haiku-4-5-20251001" \
  LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" \
  ANTHROPIC_BASE_URL="https://llm-proxy.eval.all-hands.dev" \
  ANTHROPIC_API_KEY="$LITELLM_API_KEY" \
  OPENAI_BASE_URL="https://llm-proxy.eval.all-hands.dev" \
  OPENAI_API_KEY="$LITELLM_API_KEY" \
  uv run pytest tests/examples/test_examples.py \
    --run-examples \
    --examples-results-dir .example-test-results-local \
    -k '16_deferred_init' -q

Result:

1 passed, 65 deselected, 5 warnings in 46.08s

This update/comment was created by an AI agent (OpenHands) on behalf of the requester.

@enyst enyst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all!

@enyst enyst enabled auto-merge (squash) June 18, 2026 16:08
@enyst enyst merged commit f4feb8f into main Jun 18, 2026
32 of 33 checks passed
@enyst enyst deleted the rel-1.29.0 branch June 18, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

behavior-test integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants