feat(agent-server): add /goal agent-server endpoint, background loop, and stop/resume#3770
Conversation
Add openhands.sdk.conversation.goal: a conversation-level "/goal" driver that pursues an objective by running the agent, judging completion with a second LLM, and re-prompting until the goal is done or a cap is reached. - judge_goal + GoalVerdict: the reusable objective+transcript -> verdict kernel (renders the transcript, excluding the system prompt, and asks a judge LLM for a strict-JSON verdict). - GoalController: transport-agnostic continue-vs-stop decision logic and the iteration cap. - run_goal: a thin synchronous driver over the controller that composes with any existing critic (the critic governs each inner run(); this loop governs the overall objective). Self-contained, with no agent-server dependency. Includes a runnable demo under .pr/ proving the goal work lands in the same conversation history. Relates to #3569.
Wire the SDK /goal loop into the agent server so a UI can run, watch,
stop, and resume goals on the live conversation (no fork).
- EventService.start_goal/_run_goal: a background driver that drives the
shared conversation and publishes ConversationStateUpdateEvent(key="goal")
lifecycle updates (running/complete/capped/interrupted) for a UI chip.
- stop_goal/resume_goal/_last_goal_status: cancel a running goal (a normal
user message also interrupts it) and resume from the persisted iteration,
including across a server restart.
- POST /conversations/{id}/goal, /goal/stop, /goal/resume.
Stacked on the SDK /goal core. Relates to #3569.
Coverage Report •
|
|||||||||||||||||||||||||
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review: /goal agent-server endpoints
Taste Rating
🟡 Acceptable — Solid implementation with minor test gap
Analysis
The implementation adds a background /goal driver loop that drives a shared conversation (not a fork), with proper lifecycle events for UI integration. The architecture is sound: GoalController from the SDK owns decisions, EventService owns I/O and event persistence.
[IMPROVEMENT OPPORTUNITIES]
-
[test_goal_loop.py,
test_user_message_stops_running_goal] Test Coverage Gap: The test creates a dummyasyncio.sleep(30)task instead of a real goal loop, then assertssend_messagecancels it. This tests the cancellation path, but not the actual integration: the goal task never runssend_messagewith_from_goal=True. A real goal task would pass_from_goal=Trueon subsequent iterations, so this test should be updated to either (a) use a real goal that waits for a gate, or (b) add a comment explaining this tests the "early stop" path only. -
[event_service.py,
start_goal] Minor: The checkif self._goal_task is not None and not self._goal_task.done()has a small race window. Concurrent calls could both pass the check. For this use case the window is negligible, but if strict serialization is needed, consider an asyncio Lock.
[TESTING GAPS]
- [test_goal_loop.py] The test suite covers happy paths well (
test_goal_outcomes,test_resume_from_interrupted_status) and error paths (test_start_goal_rejects_empty_objective,test_resume_without_resumable_goal_raises). Thetest_goal_halts_on_run_error_as_interruptedcase is a good addition for the "out of credits" scenario. Overall, the testing is thorough with scripted LLMs.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This PR adds new endpoints behind the existing routing infrastructure, modifiessend_messagewith a backward-compatible parameter, and introduces well-isolated async task management. No breaking changes to existing APIs. The testing uses real EventService with scripted LLMs, not mocks of the unit under test.
VERDICT
✅ Worth merging — The core implementation is solid, with appropriate separation of concerns and comprehensive end-to-end testing. The test coverage gap in test_user_message_stops_running_goal is a minor concern but doesn't block merge.
KEY INSIGHT
The design correctly isolates decisions (SDK's GoalController) from transport (EventService), making the goal loop reusable across different runtime contexts.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The PR achieves its stated goal: I exercised the live agent-server over HTTP and verified /goal, /goal/stop, and /goal/resume drive the shared conversation, emit/persist goal state updates, and resume after interruption/restart.
Does this PR achieve its stated goal?
Yes. On the base branch, /api/conversations/{id}/goal was not present and returned 404; on the PR branch, the same real server flow returned 200 and produced persisted ConversationStateUpdateEvent(key="goal") lifecycle updates. Stop/resume also worked: an in-flight goal emitted interrupted, /goal/resume completed it, and the same resume flow worked after restarting the server.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv sync --dev completed; live FastAPI/uvicorn server started successfully |
| CI Status | 🟡 13 successful, 0 failing, 1 pending (QA Changes by OpenHands) at time of review |
| Functional Verification | ✅ Live HTTP scenarios passed for baseline absence, start/complete, stop/resume, and resume after restart |
Functional Verification
Test 1: Baseline branch has no /goal endpoint
Step 1 — Reproduce / establish baseline (without the fix):
Checked out vasco/goal-sdk and ran uv run python /tmp/qa_goal_live.py baseline against a live uvicorn agent-server, creating a real conversation and POSTing to /api/conversations/{id}/goal:
{
"goal_paths_present": [],
"goal_post_body": "{"detail":"Not Found"}",
"goal_post_status": 404,
"mode": "baseline"
}This confirms the base branch did not expose the claimed goal-control API.
Step 2 — Apply the PR's changes:
Checked out vasco/goal-agent-server at e384cd5c5a1039e2c429d412b6020f0244f58d6d.
Test 2: Start /goal and drive it to completion
Step 3 — Re-run with the fix in place:
Ran uv run python /tmp/qa_goal_live.py complete against the PR branch live server. The script created a conversation through POST /api/conversations, injected a scripted TestLLM into that live conversation, then called POST /api/conversations/{id}/goal with a real objective:
{
"start_status": 200,
"start_body": {"success": true},
"goal_statuses": [
{"active": true, "iteration": 0, "status": "running"},
{"active": false, "iteration": 1, "status": "complete", "verdict": {"complete": true, "score": 1.0}}
]
}This shows the new endpoint accepts the request, starts the background loop, and persists the UI-consumable goal state updates through completion.
Test 3: Stop and resume a running goal
Ran uv run python /tmp/qa_goal_live.py stop-resume. The goal was started with a gated LLM call to simulate a user stopping while work was in flight, then /goal/stop and /goal/resume were called over HTTP:
{
"start_status": 200,
"llm_entered": true,
"stop_status": 200,
"stop_was_pending_until_llm_released": true,
"status_after_stop": {"active": false, "iteration": 0, "status": "interrupted"},
"resume_status": 200,
"final_goal_status": {"active": false, "iteration": 1, "status": "complete", "verdict": {"complete": true, "score": 1.0}}
}This confirms the stop endpoint records a resumable interrupted state and resume continues the shared conversation to completion. The observed pending stop call matches the PR note that stop is graceful and waits for the in-flight LLM call to finish.
Test 4: Resume after server restart
Ran a live two-server scenario: start goal → stop/interrupted → shut down uvicorn → start a new uvicorn server → call /goal/resume for the same persisted conversation:
{
"start_status": 200,
"stop_status": 200,
"status_before_restart": {"active": false, "iteration": 0, "objective": "survive restart", "status": "interrupted"},
"resume_after_restart_status": 200,
"final_goal_status_after_restart": {"active": false, "iteration": 1, "objective": "survive restart", "status": "complete", "verdict": {"complete": true, "score": 1.0}}
}This verifies the persisted status events are sufficient for the new server instance to resume the goal.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
|
@OpenHands resolve the conflicts here |
|
@VascoSch92 it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again. |
|
I'm on it! VascoSch92 can track my progress at all-hands.dev |
# Conflicts: # openhands-sdk/openhands/sdk/conversation/goal/controller.py # tests/sdk/conversation/goal/test_controller.py
|
✅ PR Artifacts Cleaned Up The |
|
Resolved merge conflicts with Conflicts were in two files ( Verified locally: This comment was posted by an AI agent (OpenHands) on behalf of @VascoSch92. |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
|
OpenHands encountered an error: **Failed to send message to agent server: HTTP 503 error: no available server See the conversation for more information. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable — the architecture is clean and the test suite is comprehensive; one meaningful correctness gap in the exception path needs fixing before merge.
Summary: This PR wires the SDK /goal loop into the agent server with a tidy three-endpoint design (/goal, /goal/stop, /goal/resume). The persistence-via-ConversationStateUpdateEvent approach for cross-restart resumability is elegant. Tests cover the happy path, stop/resume lifecycle, user-message interruption, and the run-error path with real EventService + scripted LLMs — solid test hygiene.
[CRITICAL ISSUES]
The except Exception handler in _run_goal swallows unexpected failures (e.g. a judge LLM network error, a GoalController internal bug) without recording a terminal status event. The last persisted event will still have active=True, status="running", so the UI will display the goal as running when it has actually died. The CancelledError path two lines above already handles this correctly — the exception path should mirror it (see inline comment).
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
New background-task logic on a live conversation with asyncio cancel/resume semantics. The logic is sound and well-tested, but the exception-path status gap means a judge LLM failure leaves persistent incorrect UI state (active=Truewhen the loop is dead). Low probability in practice; trivially easy to fix.
VERDICT:
❌ Needs rework: Fix the exception-path status emission before merging.
KEY INSIGHT:
The except Exception handler must emit an interrupted status just like the CancelledError path does, or any unexpected failure leaves the goal permanently "running" in the UI.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing. See the customization docs for the required frontmatter format.- Re-request a review — the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.Was this review helpful? React with 👍 or 👎 to give feedback.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the new /goal agent-server API and goal-loop behavior by running the server and exercising real EventService flows; no functional issues found.
Does this PR achieve its stated goal?
Yes. The PR sets out to make /goal usable from agent-server/UI by adding HTTP endpoints, background goal execution, goal lifecycle events, and stop/resume behavior. I verified main lacks the endpoint, the PR exposes /goal, /goal/stop, and /goal/resume, real HTTP requests succeed, goal status events are persisted, and direct EventService execution completes, stops, resumes, and resumes after service recreation from persisted state.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv sync --dev completed and the agent server started locally. |
| CI Status | ✅ Current checks are passing; qa-changes is still in progress for this review, with artifact cleanup checks skipped/success as expected. |
| Functional Verification | ✅ HTTP endpoint behavior and goal-loop lifecycle behavior worked as described. |
Functional Verification
Test 1: Baseline vs PR endpoint exposure
Step 1 — Establish baseline on main:
Started python -m openhands.agent_server on 127.0.0.1:8010 and queried the new route/OpenAPI.
POST /api/conversations/{id}/goal -> HTTP 404
OpenAPI goal paths on main: []
This shows the /goal API is not present before the PR.
Step 2 — Apply the PRs changes:
Checked out vasco/goal-agent-server at 610bbcd1abcd4f593f1afc5383429a7370e5c066 and started the agent server.
Step 3 — Re-run with the PR:
Queried server metadata and OpenAPI.
GET / -> version 1.28.0
OpenAPI goal paths:
- /api/conversations/{conversation_id}/goal
- /api/conversations/{conversation_id}/goal/stop
- /api/conversations/{conversation_id}/goal/resume
This confirms the PR exposes the intended backend entry points.
Test 2: HTTP start/stop/resume behavior
Step 1 — Baseline:
On main, the same POST /goal request returned 404, so there was no route to start a goal.
Step 2 — Apply the PRs changes:
Created a real conversation through the PR server with a local workspace and an agent payload.
POST /api/conversations -> HTTP 201
conversation id: c4d50c95-8cad-4977-9347-4da0c987770e
execution_status: idle
Step 3 — Exercise the new API:
Started, stopped, and resumed a goal via HTTP.
POST /api/conversations/c4d50c95-8cad-4977-9347-4da0c987770e/goal -> HTTP 200
{"success":true}
POST /api/conversations/c4d50c95-8cad-4977-9347-4da0c987770e/goal/stop -> HTTP 200
{"success":true}
POST /api/conversations/c4d50c95-8cad-4977-9347-4da0c987770e/goal/resume -> HTTP 200
{"success":true}
Because the HTTP conversation used a placeholder LLM instead of external credentials, the run halted as interrupted, which is the resumable failure behavior described by the PR. The persisted event stream showed the goal lifecycle updates:
[(ConversationStateUpdateEvent, goal, {active: True, status: running, iteration: 0, max_iterations: 1, ...}),
(ConversationStateUpdateEvent, goal, {active: False, status: interrupted, iteration: 0, max_iterations: 1, ...}),
(ConversationStateUpdateEvent, goal, {active: True, status: running, iteration: 0, max_iterations: 1, ...}),
(ConversationStateUpdateEvent, goal, {active: False, status: interrupted, iteration: 0, max_iterations: 1, ...})]
I also verified the user-facing error case:
POST /api/conversations/{new_conversation_id}/goal/resume -> HTTP 400
{"detail":"no_resumable_goal"}
Test 3: Goal loop completion, verdict events, stop, and resume
Step 1 — Baseline:
On main, there is no agent-server goal-loop API to drive; only the PR branch includes the EventService goal driver.
Step 2 — Apply the PRs changes:
Ran a small user script against a real EventService + LocalConversation, using deterministic TestLLM responses for the agent and judge so no external credentials were required.
Step 3 — Exercise the behavior:
The goal completed after two judge rounds and emitted per-round status/verdict data:
COMPLETE_SCENARIO
event_count 1 -> 16
goal_statuses [running, running, complete]
running_verdict_missing [tests not passing]
final {active: False, status: complete, iteration: 2, max_iterations: 5, objective: Make the QA proof goal complete, verdict: {score: 1.0, complete: True, missing: }}
A running goal could be stopped and then resumed to completion:
STOP_SCENARIO
judge_entered True
stop_returned True
after_stop {active: False, status: interrupted, iteration: 1, max_iterations: 4, objective: Resume this QA goal after interruption, verdict: None}
after_resume {active: False, status: complete, iteration: 2, max_iterations: 4, objective: Resume this QA goal after interruption, verdict: {score: 1.0, complete: True, missing: }}
This confirms the background loop, per-round UI chip data, graceful stop, and same-session resume behavior.
Test 4: Resume after EventService recreation
Step 1 — Baseline:
Without the PR goal-status persistence/resume logic, there is no resumable agent-server goal state to reconstruct.
Step 2 — Apply the PRs changes:
Started a goal, stopped it mid-audit, closed the first EventService, recreated another EventService with the same persisted conversation directory and conversation id, then called resume_goal(...).
Step 3 — Re-run with recreated service:
RESTART_SCENARIO
same_conversation_id 2eccccdb-c50a-48a6-ba14-47d9a1c3bbe6
judge_entered_before_stop True
stop_returned True
before_close {active: False, status: interrupted, iteration: 1, max_iterations: 3, objective: Persist and resume this goal, verdict: None}
after_recreated_service_resume {active: False, status: complete, iteration: 2, max_iterations: 3, objective: Persist and resume this goal, verdict: {score: 1.0, complete: True, missing: }}
This verifies the PRs claim that persisted goal status can be used to resume after a server/service restart.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
… unexpected error
|
@OpenHands Describe to me what does it mean to start a goal, to stop a goal, and to resume. I mean, the REST API endpoints, what do they do, in essence. Keep it concise, the high level execution path |
|
@enyst it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again. |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
@enyst High-level execution path:
A normal user message also calls This comment was created by an AI agent (OpenHands) on behalf of the requester. |
|
OpenHands encountered an error: **Failed to send message to agent server: HTTP 503 error: no available server See the conversation for more information. |
|
@enyst it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again. |
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands in this Pr rename every method: *_goal_conversation -> *_goal_in_conversation. Moreover, update the docstring and the comments with the description given here: #3770 (comment). Make coincise and clear docstring. |
|
@VascoSch92 it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again. |
|
I'm on it! VascoSch92 can track my progress at all-hands.dev |
|
@enyst I blocked my agent. I think we are good to go if for you is good |
enyst
left a comment
There was a problem hiding this comment.
Thank you, and sorry for the confusion! My slow brain is… yeah, slow
The `_return_metrics` parameter was removed from LLM/TestLLM.completion in 2bedbd8 (removed_in 1.29.0), but tests/agent_server/test_goal_loop.py's _GatedLLM override (added later in #3770) still declared it as a positional param and forwarded it to super().completion(), which pyright flagged as an incompatible override and a 5-positional call against a 4-positional signature. Drop _return_metrics to match the current signature.
The `_return_metrics` parameter was removed from LLM/TestLLM.completion in 2bedbd8 (removed_in 1.29.0, on this release branch), but tests/agent_server/test_goal_loop.py's _GatedLLM override (added in #3770) still declared it as a positional param and forwarded it to super().completion(), which pyright flagged as an incompatible override and a 5-positional call against the new 4-positional signature. Drop _return_metrics to match.
HUMAN:
Added the agent-server part to create /goal command in the agent-canvas UI.
AGENT:
Why
Builds on the SDK
/goalcore (#3769) to make/goalusable from the agentserver / UI: start a goal on the live conversation, stream its work inline, and
control it (stop / resume / interject) — per #3569.
Summary
EventService.start_goal/_run_goal: a background driver that drives theshared conversation (no fork) and publishes
ConversationStateUpdateEvent(key="goal")lifecycle updates(running / complete / capped / interrupted) for a UI chip.
stop_goal/resume_goal/_last_goal_status: cancel a running goal (anormal user message also interrupts it via the
_from_goalsend_messagehook) and resume from the persisted iteration — works within a session and
across a server restart (the status events are persisted).
POST /conversations/{id}/goal,/goal/stop,/goal/resume.Issue Number
#3569
How to Test
test_goal_loop.pyis end-to-end against a realEventService+LocalConversation(scriptedTestLLMagent + judge, no mocks of the unit undertest): it starts a goal and drives it to
complete/capped, stops a runninggoal (asserting the
interruptedstatus is persisted to the shared eventlog), resumes from an interrupted state, verifies a user message interrupts a
running goal, and checks the out-of-credits path (agent run error →
interrupted,resumable).
test_conversation_router.pyexercises the HTTP endpoints(200 / 404 / 409 / 400).
Video/Screenshots
N/A — backend. UI wiring consumes the new event: filter the stream for
kind == "ConversationStateUpdateEvent" && key == "goal"and rendervalue.{status, iteration, max_iterations}as a chip (see Notes).Type
Notes
key="goal"state-update events,and call the stop/resume endpoints from chip controls.
stopis a graceful cancel — an in-flight LLM call finishes before theconversation goes quiescent; hard-abort via the server's
interrupt()is apossible follow-up.
GET /conversations/{id}/goalso the UI can fetch thecurrent goal status on load without scanning events.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:6e9e355-pythonRun
All tags pushed for this build
About Multi-Architecture Support
6e9e355-python) is a multi-arch manifest supporting both amd64 and arm646e9e355-python-amd64) are also available if needed