fix(task): surface non-FINISHED sub-agent runs as errors, not empty success by ak684 · Pull Request #3742 · OpenHands/software-agent-sdk

ak684 · 2026-06-16T00:20:29Z

HUMAN:

bug fix: a non-FINISHED sub-agent (stuck/paused/run-limit) was reported to the parent as an empty success.

A human has tested these changes.

AGENT:

Narrowly scoped bug fix for sub-agent (Task tool) result surfacing. uv run pytest tests/tools/task/test_task_manager.py = 53 pass; pre-commit (ruff + pyright) clean.

Why

A sub-agent whose run ends in any non-FINISHED terminal status — stuck detection, paused, or a run-limit — was reported to the parent task as an empty success (Task completed with no result, is_error=False). _run_task only special-cased ERROR; every other non-FINISHED status fell through to the success branch, so the parent agent couldn't distinguish a real result from a silently-aborted sub-agent.

Summary

_run_task now treats only FINISHED as success; any other terminal status (stuck, paused, run-limit) is surfaced to the parent as an error.
The error detail preserves any partial output (_run_stop_detail) so the parent can use or retry it instead of getting nothing.

Issue Number

(none)

How to Test

uv run pytest tests/tools/task/test_task_manager.py

Covers: a non-FINISHED (stuck) status surfaced as an error; _run_stop_detail returns the last error event / falls back to a status message.

Type

Bug fix

Notes

This PR was narrowed to the status-surfacing fix only; the per-run budget plumbing it originally carried was dropped so the budget can be reworked separately. The change here is independent of the budget feature.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:61de89c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-61de89c-python \
  ghcr.io/openhands/agent-server:61de89c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:61de89c-golang-amd64
ghcr.io/openhands/agent-server:61de89cc55a99f675aaa1b3da427a9d6ce823968-golang-amd64
ghcr.io/openhands/agent-server:alona-fix-subagent-budget-golang-amd64
ghcr.io/openhands/agent-server:61de89c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:61de89c-golang-arm64
ghcr.io/openhands/agent-server:61de89cc55a99f675aaa1b3da427a9d6ce823968-golang-arm64
ghcr.io/openhands/agent-server:alona-fix-subagent-budget-golang-arm64
ghcr.io/openhands/agent-server:61de89c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:61de89c-java-amd64
ghcr.io/openhands/agent-server:61de89cc55a99f675aaa1b3da427a9d6ce823968-java-amd64
ghcr.io/openhands/agent-server:alona-fix-subagent-budget-java-amd64
ghcr.io/openhands/agent-server:61de89c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:61de89c-java-arm64
ghcr.io/openhands/agent-server:61de89cc55a99f675aaa1b3da427a9d6ce823968-java-arm64
ghcr.io/openhands/agent-server:alona-fix-subagent-budget-java-arm64
ghcr.io/openhands/agent-server:61de89c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:61de89c-python-amd64
ghcr.io/openhands/agent-server:61de89cc55a99f675aaa1b3da427a9d6ce823968-python-amd64
ghcr.io/openhands/agent-server:alona-fix-subagent-budget-python-amd64
ghcr.io/openhands/agent-server:61de89c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:61de89c-python-arm64
ghcr.io/openhands/agent-server:61de89cc55a99f675aaa1b3da427a9d6ce823968-python-arm64
ghcr.io/openhands/agent-server:alona-fix-subagent-budget-python-arm64
ghcr.io/openhands/agent-server:61de89c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:61de89c-golang
ghcr.io/openhands/agent-server:61de89cc55a99f675aaa1b3da427a9d6ce823968-golang
ghcr.io/openhands/agent-server:alona-fix-subagent-budget-golang
ghcr.io/openhands/agent-server:61de89c-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:61de89c-java
ghcr.io/openhands/agent-server:61de89cc55a99f675aaa1b3da427a9d6ce823968-java
ghcr.io/openhands/agent-server:alona-fix-subagent-budget-java
ghcr.io/openhands/agent-server:61de89c-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:61de89c-python
ghcr.io/openhands/agent-server:61de89cc55a99f675aaa1b3da427a9d6ce823968-python
ghcr.io/openhands/agent-server:alona-fix-subagent-budget-python
ghcr.io/openhands/agent-server:61de89c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., 61de89c-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 61de89c-python-amd64) are also available if needed

github-actions · 2026-06-16T00:21:09Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-16T00:21:10Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-16T00:27:36Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-tools/openhands/tools/task
manager.py	180	121	32%	82–84, 88–90, 100–101, 103–104, 108, 118, 122, 125, 128–133, 135, 141–142, 146, 150–153, 156–160, 182–183, 185–186, 191, 196, 203–205, 210–213, 222, 227, 234, 247–248, 250, 256, 261–262, 264, 274, 279, 285, 297–298, 300–303, 305, 323–324, 328–329, 331–332, 335, 337, 340, 343, 347–348, 350–353, 355–362, 367–371, 373–374, 376, 385, 390, 395–396, 402–403, 407–409, 411, 414, 416–417, 428–429, 433, 440–441, 449, 453, 458, 460–461
TOTAL	31270	15761	49%

all-hands-bot

✅ QA Report: PASS

The PR achieves its stated goal: I reproduced the reported base-branch failures, then exercised the same SDK/task flows on the PR branch and observed the corrected behavior.

Does this PR achieve its stated goal?

Yes. The PR fixes the public local Conversation(..., max_budget_per_run=...) plumbing, makes the remote path fail loudly, surfaces a non-FINISHED sub-agent as an error instead of an empty success, and preserves inherited caps across task resume. I also verified the configured LiteLLM proxy returns non-zero cost metrics and that the local budget-stop path produces an ERROR run-limit event when accumulated cost exceeds the ceiling.

Phase	Result
Environment Setup	✅ `uv sync --dev` completed and repo remained clean after QA runs
CI Status	🟡 34 successful, 1 skipped, 1 pending (`QA Changes by OpenHands/qa-changes`) at review time
Functional Verification	✅ Base failures reproduced; PR behavior verified through SDK APIs and a live LLM completion

Functional Verification

Test 1: Public `Conversation` budget argument and remote handling

Step 1 — Reproduce / establish baseline (without the fix):
Ran against detached base 9750af138dd67255d24526a7e438650deac83476:

cd /tmp/oh-sdk-qa-base && \
OPENHANDS_SUPPRESS_BANNER=1 \
PYTHONPATH="/tmp/oh-sdk-qa-base/openhands-sdk:/tmp/oh-sdk-qa-base/openhands-tools:/tmp/oh-sdk-qa-base/openhands-workspace:/tmp/oh-sdk-qa-base/openhands-agent-server" \
/home/runner/work/software-agent-sdk/software-agent-sdk/pr-repo/.venv/bin/python /tmp/qa_subagent_budget.py 2>&1 | grep '^QA:'

Output excerpt:

QA: local_factory_budget=ERROR type=TypeError message="Conversation.__new__() got an unexpected keyword argument 'max_budget_per_run'"
QA: remote_factory_budget=ERROR type=TypeError message="Conversation.__new__() got an unexpected keyword argument 'max_budget_per_run'"

This confirms the reported public-factory bug: a user could not pass max_budget_per_run through Conversation(...) at all.

Step 2 — Apply the PR's changes:
Checked out PR branch alona/fix-subagent-budget at 610f54a1325cd98e1c10e9f83a521b4ff3e7d52d.

Step 3 — Re-run with the fix in place:
Ran the same SDK exercise on the PR branch:

cd /home/runner/work/software-agent-sdk/software-agent-sdk/pr-repo && \
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_subagent_budget.py 2>&1 | grep '^QA:'

Output excerpt:

QA: local_factory_budget=OK type=LocalConversation budget=2.5
QA: remote_factory_budget=ERROR type=NotImplementedError message='max_budget_per_run is not yet enforced for remote conversations (server-side budget tracking is pending).'

This shows the local user entry point now accepts and forwards the budget, while remote conversations fail explicitly instead of silently ignoring unsupported budget enforcement.

Test 2: Sub-agent non-`FINISHED` result and resume cap inheritance

Step 1 — Reproduce / establish baseline (without the fix):
Same base run output:

QA: task_stuck_result status=completed result='' error=None
QA: task_resume_caps first='caps iter=42 budget=7.0' second='caps iter=500 budget=None'

This confirms both reported bugs: a STUCK sub-agent was reported as an empty completed task, and resuming a task dropped inherited caps back to defaults.

Step 2 — Apply the PR's changes:
Checked out PR branch alona/fix-subagent-budget at 610f54a1325cd98e1c10e9f83a521b4ff3e7d52d.

Step 3 — Re-run with the fix in place:
Same PR run output:

QA: task_stuck_result status=error result=None error='Sub-agent stopped without finishing (status: stuck).'
QA: task_resume_caps first='caps iter=42 budget=7.0' second='caps iter=42 budget=7.0'

This verifies the sub-agent stop is now surfaced to the parent as an error, and resumed task conversations retain the parent-provided iteration and budget caps.

Test 3: Cost signal and local budget stop behavior

Step 1 — Reproduce / establish baseline (without the fix):
The base public Conversation(..., max_budget_per_run=...) entry point failed with TypeError in Test 1, so a real user could not configure this through the public factory before the PR.

Step 2 — Apply the PR's changes:
Used the PR branch with the configured LLM_MODEL, LLM_BASE_URL, and LLM_API_KEY environment.

Step 3 — Re-run with the fix in place:
Ran a live, tiny LLM completion through the configured LiteLLM proxy:

OPENHANDS_SUPPRESS_BANNER=1 uv run python - <<'PY'
import os
from openhands.sdk import LLM
from openhands.sdk.llm.message import Message, TextContent
llm = LLM(model=os.environ['LLM_MODEL'], api_key=os.environ['LLM_API_KEY'], base_url=os.environ['LLM_BASE_URL'], max_output_tokens=5, num_retries=0)
response = llm.completion([Message(role='user', content=[TextContent(text='Reply with exactly: OK')])], _return_metrics=True)
print(response.model_dump(exclude_none=True, exclude={'raw_response'}))
PY

Output excerpt:

'metrics': {'model_name': 'litellm_proxy/openai/gpt-5.5', 'accumulated_cost': 0.00020500000000000002, ...}

Then exercised a deterministic local budget-stop run:

OPENHANDS_SUPPRESS_BANNER=1 uv run python - <<'PY'
# Custom agent adds $0.01 to conversation metrics and stays RUNNING; budget is $0.001.
# Full command run in QA environment.
PY

Output:

QA: deterministic_budget_status error
QA: deterministic_budget_errors ['Agent reached maximum budget limit ($0.0010); accumulated cost $0.0100.']

This shows the environment provides a non-zero LiteLLM cost signal for priced models, and the local budget ceiling can stop a still-running conversation with the expected ERROR detail.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

…uccess A sub-agent whose run ends in any non-FINISHED terminal status (stuck detection, paused, a run-limit) was reported to the parent task as an empty "completed" (result="", is_error=False). _run_task now treats only FINISHED as success; every other status is surfaced as an error, and the detail preserves any partial output so the parent can use or retry it.

all-hands-bot

✅ QA Report: PASS

Verified the Task tool now surfaces stopped/non-FINISHED sub-agent runs as errors, while preserving normal FINISHED success behavior and partial output on stopped runs.

Does this PR achieve its stated goal?

Yes. The stated goal was to stop reporting non-FINISHED sub-agent runs as empty successes to the parent. Exercising the public TaskExecutor path showed the base branch returned completed, is_error=False, and Task completed with no result. for a STUCK sub-agent; the PR returns error, is_error=True, and Sub-agent stopped without finishing (status: stuck). The PR also preserves partial output for ERROR/run-limit-style stops and leaves FINISHED runs returning successful results.

Phase	Result
Environment Setup	✅ `uv sync --dev` completed successfully before verification
CI Status	✅ Latest completed checks observed green; `qa-changes` was still in progress during this review
Functional Verification	✅ Public TaskExecutor behavior and stop-detail behavior verified with deterministic SDK-style execution

Functional Verification

Test 1: Public TaskExecutor no longer reports STUCK sub-agent as empty success

Step 1 — Reproduce / establish baseline (without the fix):
Ran a deterministic SDK-style script through TaskExecutor(TaskAction(...), conversation=parent) with a demo sub-conversation ending in ConversationExecutionStatus.STUCK on origin/main:

HEAD is now at 0d77efc4 fix(subagent): freshen explicit condensers per spawn (#3743)
Task 'task_executor_demo' completed.
observation_status= completed
observation_is_error= False
observation_text= Task completed with no result.

This confirms the bug exists on the base branch: a stopped sub-agent is surfaced to the parent as a successful empty result.

Step 2 — Apply the PR's changes:
Checked out commit 61de89cc55a99f675aaa1b3da427a9d6ce823968.

Step 3 — Re-run with the fix in place:
Ran the same TaskExecutor scenario:

HEAD is now at 61de89cc fix(task): surface non-FINISHED sub-agent runs as errors, not empty success
Task 'task_executor_demo' stopped: status 'stuck'.
observation_status= error
observation_is_error= True
observation_text= Sub-agent stopped without finishing (status: stuck).

This shows the parent-facing Task observation now marks the stopped sub-agent as an error with an explanatory status, which is the PR's primary goal.

Test 2: Partial output is preserved for stopped ERROR/run-limit-style tasks

Step 1 — Reproduce / establish baseline (without the fix):
Ran a TaskManager scenario on origin/main where the sub-conversation ended in ERROR, had a ConversationErrorEvent, and had prior agent output:

Task 'task_partial' ended with an error.
task_status= error
is_error= True
error= Agent reached maximum iterations.

This shows the old behavior surfaced the error reason but discarded the available partial agent output.

Step 2 — Apply the PR's changes:
Checked out commit 61de89cc55a99f675aaa1b3da427a9d6ce823968.

Step 3 — Re-run with the fix in place:
Ran the same stopped task scenario:

Task 'task_partial' stopped: status 'error'.
task_status= error
is_error= True
error= Agent reached maximum iterations.
Partial result:
partial answer before stop

This confirms the PR keeps the error semantics while adding the partial result detail promised in the PR description.

Test 3: FINISHED sub-agent success path still works

Step 1 — Establish expected behavior:
A normal finished sub-agent should remain a successful completed task.

Step 2 — Apply the PR's changes:
Checked out commit 61de89cc55a99f675aaa1b3da427a9d6ce823968.

Step 3 — Run with the fix in place:
Ran a deterministic TaskManager scenario with ConversationExecutionStatus.FINISHED and final assistant output:

Task 'task_finished' completed.
task_status= completed
is_error= False
result= finished answer
error= None

This shows the success path still returns the final result and is not incorrectly marked as an error.

Issues Found

None.

This review was generated by an AI agent (OpenHands) on behalf of the user.

ak684 marked this pull request as ready for review June 16, 2026 00:34

all-hands-bot reviewed Jun 16, 2026

View reviewed changes

ak684 force-pushed the alona/fix-subagent-budget branch from 610f54a to 61de89c Compare June 16, 2026 19:43

ak684 changed the title ~~fix(task): correct sub-agent budget plumbing and non-FINISHED handling~~ fix(task): surface non-FINISHED sub-agent runs as errors, not empty success Jun 16, 2026

ak684 marked this pull request as draft June 16, 2026 19:45

ak684 marked this pull request as ready for review June 16, 2026 19:52

all-hands-bot reviewed Jun 16, 2026

View reviewed changes

VascoSch92 approved these changes Jun 18, 2026

View reviewed changes

ak684 merged commit e75b9c2 into main Jun 18, 2026
44 of 45 checks passed

ak684 deleted the alona/fix-subagent-budget branch June 18, 2026 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(task): surface non-FINISHED sub-agent runs as errors, not empty success#3742

fix(task): surface non-FINISHED sub-agent runs as errors, not empty success#3742
ak684 merged 1 commit into
mainfrom
alona/fix-subagent-budget

ak684 commented Jun 16, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ak684 commented Jun 16, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

Issue Number

How to Test

Type

Notes

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Public Conversation budget argument and remote handling

Test 2: Sub-agent non-FINISHED result and resume cap inheritance

Test 3: Cost signal and local budget stop behavior

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Public TaskExecutor no longer reports STUCK sub-agent as empty success

Test 2: Partial output is preserved for stopped ERROR/run-limit-style tasks

Test 3: FINISHED sub-agent success path still works

Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ak684 commented Jun 16, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Test 1: Public `Conversation` budget argument and remote handling

Test 2: Sub-agent non-`FINISHED` result and resume cap inheritance