Skip to content

feat(agent-server): add /goal agent-server endpoint, background loop, and stop/resume#3770

Merged
VascoSch92 merged 16 commits into
mainfrom
vasco/goal-agent-server
Jun 18, 2026
Merged

feat(agent-server): add /goal agent-server endpoint, background loop, and stop/resume#3770
VascoSch92 merged 16 commits into
mainfrom
vasco/goal-agent-server

Conversation

@VascoSch92

@VascoSch92 VascoSch92 commented Jun 17, 2026

Copy link
Copy Markdown
Member

HUMAN:

Added the agent-server part to create /goal command in the agent-canvas UI.

  • A human has tested these changes.

AGENT:

Stacked on #3769 (SDK /goal core). Base branch is vasco/goal-sdk; review/merge
that first. Kept as a draft until #3769 lands.

Why

Builds on the SDK /goal core (#3769) to make /goal usable from the agent
server / UI: start a goal on the live conversation, stream its work inline, and
control it (stop / resume / interject) — per #3569.

Summary

  • EventService.start_goal / _run_goal: a background driver that drives the
    shared conversation (no fork) and publishes
    ConversationStateUpdateEvent(key="goal") lifecycle updates
    (running / complete / capped / interrupted) for a UI chip.
  • stop_goal / resume_goal / _last_goal_status: cancel a running goal (a
    normal user message also interrupts it via the _from_goal send_message
    hook) and resume from the persisted iteration — works within a session and
    across a server restart (the status events are persisted).
  • Endpoints: POST /conversations/{id}/goal, /goal/stop, /goal/resume.

Issue Number

#3569

How to Test

uv run pytest tests/agent_server/test_goal_loop.py tests/agent_server/test_conversation_router.py -q

test_goal_loop.py is end-to-end against a real EventService +
LocalConversation (scripted TestLLM agent + judge, no mocks of the unit under
test): it starts a goal and drives it to complete/capped, stops a running
goal (asserting the interrupted status is persisted to the shared event
log), resumes from an interrupted state, verifies a user message interrupts a
running goal, and checks the out-of-credits path (agent run error → interrupted,
resumable). test_conversation_router.py exercises the HTTP endpoints
(200 / 404 / 409 / 400).

Video/Screenshots

N/A — backend. UI wiring consumes the new event: filter the stream for
kind == "ConversationStateUpdateEvent" && key == "goal" and render
value.{status, iteration, max_iterations} as a chip (see Notes).

Type

  • Feature

Notes

  • UI integration: render the goal chip from the key="goal" state-update events,
    and call the stop/resume endpoints from chip controls.
  • stop is a graceful cancel — an in-flight LLM call finishes before the
    conversation goes quiescent; hard-abort via the server's interrupt() is a
    possible follow-up.
  • Optional follow-up: a GET /conversations/{id}/goal so the UI can fetch the
    current goal status on load without scanning events.

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:6e9e355-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-6e9e355-python \
  ghcr.io/openhands/agent-server:6e9e355-python

All tags pushed for this build

ghcr.io/openhands/agent-server:6e9e355-golang-amd64
ghcr.io/openhands/agent-server:6e9e355f97e1e945837c3355c3e1cca357063eac-golang-amd64
ghcr.io/openhands/agent-server:vasco-goal-agent-server-golang-amd64
ghcr.io/openhands/agent-server:6e9e355-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:6e9e355-golang-arm64
ghcr.io/openhands/agent-server:6e9e355f97e1e945837c3355c3e1cca357063eac-golang-arm64
ghcr.io/openhands/agent-server:vasco-goal-agent-server-golang-arm64
ghcr.io/openhands/agent-server:6e9e355-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:6e9e355-java-amd64
ghcr.io/openhands/agent-server:6e9e355f97e1e945837c3355c3e1cca357063eac-java-amd64
ghcr.io/openhands/agent-server:vasco-goal-agent-server-java-amd64
ghcr.io/openhands/agent-server:6e9e355-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:6e9e355-java-arm64
ghcr.io/openhands/agent-server:6e9e355f97e1e945837c3355c3e1cca357063eac-java-arm64
ghcr.io/openhands/agent-server:vasco-goal-agent-server-java-arm64
ghcr.io/openhands/agent-server:6e9e355-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:6e9e355-python-amd64
ghcr.io/openhands/agent-server:6e9e355f97e1e945837c3355c3e1cca357063eac-python-amd64
ghcr.io/openhands/agent-server:vasco-goal-agent-server-python-amd64
ghcr.io/openhands/agent-server:6e9e355-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:6e9e355-python-arm64
ghcr.io/openhands/agent-server:6e9e355f97e1e945837c3355c3e1cca357063eac-python-arm64
ghcr.io/openhands/agent-server:vasco-goal-agent-server-python-arm64
ghcr.io/openhands/agent-server:6e9e355-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:6e9e355-golang
ghcr.io/openhands/agent-server:6e9e355f97e1e945837c3355c3e1cca357063eac-golang
ghcr.io/openhands/agent-server:vasco-goal-agent-server-golang
ghcr.io/openhands/agent-server:6e9e355-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:6e9e355-java
ghcr.io/openhands/agent-server:6e9e355f97e1e945837c3355c3e1cca357063eac-java
ghcr.io/openhands/agent-server:vasco-goal-agent-server-java
ghcr.io/openhands/agent-server:6e9e355-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:6e9e355-python
ghcr.io/openhands/agent-server:6e9e355f97e1e945837c3355c3e1cca357063eac-python
ghcr.io/openhands/agent-server:vasco-goal-agent-server-python
ghcr.io/openhands/agent-server:6e9e355-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 6e9e355-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 6e9e355-python-amd64) are also available if needed

Add openhands.sdk.conversation.goal: a conversation-level "/goal" driver
that pursues an objective by running the agent, judging completion with a
second LLM, and re-prompting until the goal is done or a cap is reached.

- judge_goal + GoalVerdict: the reusable objective+transcript -> verdict
  kernel (renders the transcript, excluding the system prompt, and asks a
  judge LLM for a strict-JSON verdict).
- GoalController: transport-agnostic continue-vs-stop decision logic and
  the iteration cap.
- run_goal: a thin synchronous driver over the controller that composes
  with any existing critic (the critic governs each inner run(); this loop
  governs the overall objective).

Self-contained, with no agent-server dependency. Includes a runnable demo
under .pr/ proving the goal work lands in the same conversation history.

Relates to #3569.
Wire the SDK /goal loop into the agent server so a UI can run, watch,
stop, and resume goals on the live conversation (no fork).

- EventService.start_goal/_run_goal: a background driver that drives the
  shared conversation and publishes ConversationStateUpdateEvent(key="goal")
  lifecycle updates (running/complete/capped/interrupted) for a UI chip.
- stop_goal/resume_goal/_last_goal_status: cancel a running goal (a normal
  user message also interrupts it) and resume from the persisted iteration,
  including across a server restart.
- POST /conversations/{id}/goal, /goal/stop, /goal/resume.

Stacked on the SDK /goal core. Relates to #3569.
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   conversation_router.py2181394%249, 331, 377, 437, 571–574, 586–589, 624
   event_service.py68012681%114–115, 145, 148–149, 153–154, 161, 165, 171, 181–185, 188–191, 250, 271–272, 343, 394, 414, 421, 445–446, 450, 458, 461, 477, 509, 520, 527, 533, 587, 589, 593–595, 599, 608–609, 611, 615, 621, 623, 670, 700, 703, 758, 779, 887, 999–1002, 1006, 1035, 1039, 1046, 1060, 1075, 1122–1124, 1195, 1220, 1226, 1228, 1238, 1240, 1247, 1257, 1259–1260, 1264, 1278–1283, 1285, 1312, 1317–1320, 1324–1327, 1335–1338, 1377–1379, 1432–1433, 1435–1442, 1444–1445, 1454–1455, 1457–1458, 1465–1466, 1468–1469, 1489, 1495, 1501, 1510–1511
TOTAL33012693578% 

@VascoSch92 VascoSch92 changed the title feat: add /goal agent-server endpoint, background loop, and stop/resume feat(agent-server): add /goal agent-server endpoint, background loop, and stop/resume Jun 17, 2026
@VascoSch92 VascoSch92 requested a review from all-hands-bot June 17, 2026 13:12

all-hands-bot commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: /goal agent-server endpoints

Taste Rating

🟡 Acceptable — Solid implementation with minor test gap


Analysis

The implementation adds a background /goal driver loop that drives a shared conversation (not a fork), with proper lifecycle events for UI integration. The architecture is sound: GoalController from the SDK owns decisions, EventService owns I/O and event persistence.


[IMPROVEMENT OPPORTUNITIES]

  • [test_goal_loop.py, test_user_message_stops_running_goal] Test Coverage Gap: The test creates a dummy asyncio.sleep(30) task instead of a real goal loop, then asserts send_message cancels it. This tests the cancellation path, but not the actual integration: the goal task never runs send_message with _from_goal=True. A real goal task would pass _from_goal=True on subsequent iterations, so this test should be updated to either (a) use a real goal that waits for a gate, or (b) add a comment explaining this tests the "early stop" path only.

  • [event_service.py, start_goal] Minor: The check if self._goal_task is not None and not self._goal_task.done() has a small race window. Concurrent calls could both pass the check. For this use case the window is negligible, but if strict serialization is needed, consider an asyncio Lock.


[TESTING GAPS]

  • [test_goal_loop.py] The test suite covers happy paths well (test_goal_outcomes, test_resume_from_interrupted_status) and error paths (test_start_goal_rejects_empty_objective, test_resume_without_resumable_goal_raises). The test_goal_halts_on_run_error_as_interrupted case is a good addition for the "out of credits" scenario. Overall, the testing is thorough with scripted LLMs.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW
    This PR adds new endpoints behind the existing routing infrastructure, modifies send_message with a backward-compatible parameter, and introduces well-isolated async task management. No breaking changes to existing APIs. The testing uses real EventService with scripted LLMs, not mocks of the unit under test.

VERDICT

Worth merging — The core implementation is solid, with appropriate separation of concerns and comprehensive end-to-end testing. The test coverage gap in test_user_message_stops_running_goal is a minor concern but doesn't block merge.

KEY INSIGHT

The design correctly isolates decisions (SDK's GoalController) from transport (EventService), making the goal loop reusable across different runtime contexts.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation.

Comment thread tests/agent_server/test_goal_loop.py Outdated
Comment thread openhands-agent-server/openhands/agent_server/event_service.py
@VascoSch92 VascoSch92 marked this pull request as ready for review June 17, 2026 13:18

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The PR achieves its stated goal: I exercised the live agent-server over HTTP and verified /goal, /goal/stop, and /goal/resume drive the shared conversation, emit/persist goal state updates, and resume after interruption/restart.

Does this PR achieve its stated goal?

Yes. On the base branch, /api/conversations/{id}/goal was not present and returned 404; on the PR branch, the same real server flow returned 200 and produced persisted ConversationStateUpdateEvent(key="goal") lifecycle updates. Stop/resume also worked: an in-flight goal emitted interrupted, /goal/resume completed it, and the same resume flow worked after restarting the server.

Phase Result
Environment Setup uv sync --dev completed; live FastAPI/uvicorn server started successfully
CI Status 🟡 13 successful, 0 failing, 1 pending (QA Changes by OpenHands) at time of review
Functional Verification ✅ Live HTTP scenarios passed for baseline absence, start/complete, stop/resume, and resume after restart
Functional Verification

Test 1: Baseline branch has no /goal endpoint

Step 1 — Reproduce / establish baseline (without the fix):
Checked out vasco/goal-sdk and ran uv run python /tmp/qa_goal_live.py baseline against a live uvicorn agent-server, creating a real conversation and POSTing to /api/conversations/{id}/goal:

{
  "goal_paths_present": [],
  "goal_post_body": "{"detail":"Not Found"}",
  "goal_post_status": 404,
  "mode": "baseline"
}

This confirms the base branch did not expose the claimed goal-control API.

Step 2 — Apply the PR's changes:
Checked out vasco/goal-agent-server at e384cd5c5a1039e2c429d412b6020f0244f58d6d.

Test 2: Start /goal and drive it to completion

Step 3 — Re-run with the fix in place:
Ran uv run python /tmp/qa_goal_live.py complete against the PR branch live server. The script created a conversation through POST /api/conversations, injected a scripted TestLLM into that live conversation, then called POST /api/conversations/{id}/goal with a real objective:

{
  "start_status": 200,
  "start_body": {"success": true},
  "goal_statuses": [
    {"active": true, "iteration": 0, "status": "running"},
    {"active": false, "iteration": 1, "status": "complete", "verdict": {"complete": true, "score": 1.0}}
  ]
}

This shows the new endpoint accepts the request, starts the background loop, and persists the UI-consumable goal state updates through completion.

Test 3: Stop and resume a running goal

Ran uv run python /tmp/qa_goal_live.py stop-resume. The goal was started with a gated LLM call to simulate a user stopping while work was in flight, then /goal/stop and /goal/resume were called over HTTP:

{
  "start_status": 200,
  "llm_entered": true,
  "stop_status": 200,
  "stop_was_pending_until_llm_released": true,
  "status_after_stop": {"active": false, "iteration": 0, "status": "interrupted"},
  "resume_status": 200,
  "final_goal_status": {"active": false, "iteration": 1, "status": "complete", "verdict": {"complete": true, "score": 1.0}}
}

This confirms the stop endpoint records a resumable interrupted state and resume continues the shared conversation to completion. The observed pending stop call matches the PR note that stop is graceful and waits for the in-flight LLM call to finish.

Test 4: Resume after server restart

Ran a live two-server scenario: start goal → stop/interrupted → shut down uvicorn → start a new uvicorn server → call /goal/resume for the same persisted conversation:

{
  "start_status": 200,
  "stop_status": 200,
  "status_before_restart": {"active": false, "iteration": 0, "objective": "survive restart", "status": "interrupted"},
  "resume_after_restart_status": 200,
  "final_goal_status_after_restart": {"active": false, "iteration": 1, "objective": "survive restart", "status": "complete", "verdict": {"complete": true, "score": 1.0}}
}

This verifies the persisted status events are sufficient for the new server instance to resume the goal.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@VascoSch92 VascoSch92 requested a review from xingyaoww June 17, 2026 14:45
Base automatically changed from vasco/goal-sdk to main June 18, 2026 09:54
@VascoSch92

Copy link
Copy Markdown
Member Author

@OpenHands resolve the conflicts here

@openhands-development

Copy link
Copy Markdown

@VascoSch92 it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again.

@openhands-ai

openhands-ai Bot commented Jun 18, 2026

Copy link
Copy Markdown

I'm on it! VascoSch92 can track my progress at all-hands.dev

# Conflicts:
#	openhands-sdk/openhands/sdk/conversation/goal/controller.py
#	tests/sdk/conversation/goal/test_controller.py
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

PR Artifacts Cleaned Up

The .pr/ directory has been automatically removed.

Copy link
Copy Markdown
Member Author

Resolved merge conflicts with main in 610bbcd.

Conflicts were in two files (openhands-sdk/openhands/sdk/conversation/goal/controller.py and tests/sdk/conversation/goal/test_controller.py) where the goal SDK core landed on main via #3769, while this branch carries the follow-up commit c96ebb44 that adds the verdict field to GoalContinue (so the agent-server driver can publish per-round judge feedback). Kept our side in both files — the agent-server _run_goal loop relies on step.verdict to emit the per-round ConversationStateUpdateEvent(key="goal").

Verified locally: uv run pytest tests/sdk/conversation/goal/test_controller.py tests/agent_server/test_goal_loop.py tests/agent_server/test_conversation_router.py -q → 99 passed.

This comment was posted by an AI agent (OpenHands) on behalf of @VascoSch92.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@openhands-ai

openhands-ai Bot commented Jun 18, 2026

Copy link
Copy Markdown

OpenHands encountered an error: **Failed to send message to agent server: HTTP 503 error: no available server
**

See the conversation for more information.

@VascoSch92 VascoSch92 requested a review from all-hands-bot June 18, 2026 10:22

all-hands-bot commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — the architecture is clean and the test suite is comprehensive; one meaningful correctness gap in the exception path needs fixing before merge.

Summary: This PR wires the SDK /goal loop into the agent server with a tidy three-endpoint design (/goal, /goal/stop, /goal/resume). The persistence-via-ConversationStateUpdateEvent approach for cross-restart resumability is elegant. Tests cover the happy path, stop/resume lifecycle, user-message interruption, and the run-error path with real EventService + scripted LLMs — solid test hygiene.

[CRITICAL ISSUES]
The except Exception handler in _run_goal swallows unexpected failures (e.g. a judge LLM network error, a GoalController internal bug) without recording a terminal status event. The last persisted event will still have active=True, status="running", so the UI will display the goal as running when it has actually died. The CancelledError path two lines above already handles this correctly — the exception path should mirror it (see inline comment).

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM
    New background-task logic on a live conversation with asyncio cancel/resume semantics. The logic is sound and well-tested, but the exception-path status gap means a judge LLM failure leaves persistent incorrect UI state (active=True when the loop is dead). Low probability in practice; trivially easy to fix.

VERDICT:
Needs rework: Fix the exception-path status emission before merging.

KEY INSIGHT:
The except Exception handler must emit an interrupted status just like the CancelledError path does, or any unexpected failure leaves the goal permanently "running" in the UI.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing. See the customization docs for the required frontmatter format.
  2. Re-request a review — the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

Comment thread openhands-agent-server/openhands/agent_server/event_service.py

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the new /goal agent-server API and goal-loop behavior by running the server and exercising real EventService flows; no functional issues found.

Does this PR achieve its stated goal?

Yes. The PR sets out to make /goal usable from agent-server/UI by adding HTTP endpoints, background goal execution, goal lifecycle events, and stop/resume behavior. I verified main lacks the endpoint, the PR exposes /goal, /goal/stop, and /goal/resume, real HTTP requests succeed, goal status events are persisted, and direct EventService execution completes, stops, resumes, and resumes after service recreation from persisted state.

Phase Result
Environment Setup uv sync --dev completed and the agent server started locally.
CI Status ✅ Current checks are passing; qa-changes is still in progress for this review, with artifact cleanup checks skipped/success as expected.
Functional Verification ✅ HTTP endpoint behavior and goal-loop lifecycle behavior worked as described.
Functional Verification

Test 1: Baseline vs PR endpoint exposure

Step 1 — Establish baseline on main:
Started python -m openhands.agent_server on 127.0.0.1:8010 and queried the new route/OpenAPI.

POST /api/conversations/{id}/goal -> HTTP 404
OpenAPI goal paths on main: []

This shows the /goal API is not present before the PR.

Step 2 — Apply the PRs changes:
Checked out vasco/goal-agent-server at 610bbcd1abcd4f593f1afc5383429a7370e5c066 and started the agent server.

Step 3 — Re-run with the PR:
Queried server metadata and OpenAPI.

GET / -> version 1.28.0
OpenAPI goal paths:
- /api/conversations/{conversation_id}/goal
- /api/conversations/{conversation_id}/goal/stop
- /api/conversations/{conversation_id}/goal/resume

This confirms the PR exposes the intended backend entry points.

Test 2: HTTP start/stop/resume behavior

Step 1 — Baseline:
On main, the same POST /goal request returned 404, so there was no route to start a goal.

Step 2 — Apply the PRs changes:
Created a real conversation through the PR server with a local workspace and an agent payload.

POST /api/conversations -> HTTP 201
conversation id: c4d50c95-8cad-4977-9347-4da0c987770e
execution_status: idle

Step 3 — Exercise the new API:
Started, stopped, and resumed a goal via HTTP.

POST /api/conversations/c4d50c95-8cad-4977-9347-4da0c987770e/goal -> HTTP 200
{"success":true}

POST /api/conversations/c4d50c95-8cad-4977-9347-4da0c987770e/goal/stop -> HTTP 200
{"success":true}

POST /api/conversations/c4d50c95-8cad-4977-9347-4da0c987770e/goal/resume -> HTTP 200
{"success":true}

Because the HTTP conversation used a placeholder LLM instead of external credentials, the run halted as interrupted, which is the resumable failure behavior described by the PR. The persisted event stream showed the goal lifecycle updates:

[(ConversationStateUpdateEvent, goal, {active: True, status: running, iteration: 0, max_iterations: 1, ...}),
 (ConversationStateUpdateEvent, goal, {active: False, status: interrupted, iteration: 0, max_iterations: 1, ...}),
 (ConversationStateUpdateEvent, goal, {active: True, status: running, iteration: 0, max_iterations: 1, ...}),
 (ConversationStateUpdateEvent, goal, {active: False, status: interrupted, iteration: 0, max_iterations: 1, ...})]

I also verified the user-facing error case:

POST /api/conversations/{new_conversation_id}/goal/resume -> HTTP 400
{"detail":"no_resumable_goal"}

Test 3: Goal loop completion, verdict events, stop, and resume

Step 1 — Baseline:
On main, there is no agent-server goal-loop API to drive; only the PR branch includes the EventService goal driver.

Step 2 — Apply the PRs changes:
Ran a small user script against a real EventService + LocalConversation, using deterministic TestLLM responses for the agent and judge so no external credentials were required.

Step 3 — Exercise the behavior:
The goal completed after two judge rounds and emitted per-round status/verdict data:

COMPLETE_SCENARIO
event_count 1 -> 16
goal_statuses [running, running, complete]
running_verdict_missing [tests not passing]
final {active: False, status: complete, iteration: 2, max_iterations: 5, objective: Make the QA proof goal complete, verdict: {score: 1.0, complete: True, missing: }}

A running goal could be stopped and then resumed to completion:

STOP_SCENARIO
judge_entered True
stop_returned True
after_stop {active: False, status: interrupted, iteration: 1, max_iterations: 4, objective: Resume this QA goal after interruption, verdict: None}
after_resume {active: False, status: complete, iteration: 2, max_iterations: 4, objective: Resume this QA goal after interruption, verdict: {score: 1.0, complete: True, missing: }}

This confirms the background loop, per-round UI chip data, graceful stop, and same-session resume behavior.

Test 4: Resume after EventService recreation

Step 1 — Baseline:
Without the PR goal-status persistence/resume logic, there is no resumable agent-server goal state to reconstruct.

Step 2 — Apply the PRs changes:
Started a goal, stopped it mid-audit, closed the first EventService, recreated another EventService with the same persisted conversation directory and conversation id, then called resume_goal(...).

Step 3 — Re-run with recreated service:

RESTART_SCENARIO
same_conversation_id 2eccccdb-c50a-48a6-ba14-47d9a1c3bbe6
judge_entered_before_stop True
stop_returned True
before_close {active: False, status: interrupted, iteration: 1, max_iterations: 3, objective: Persist and resume this goal, verdict: None}
after_recreated_service_resume {active: False, status: complete, iteration: 2, max_iterations: 3, objective: Persist and resume this goal, verdict: {score: 1.0, complete: True, missing: }}

This verifies the PRs claim that persisted goal status can be used to resume after a server/service restart.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@VascoSch92 VascoSch92 enabled auto-merge (squash) June 18, 2026 13:24
@VascoSch92 VascoSch92 disabled auto-merge June 18, 2026 13:26
Comment thread openhands-agent-server/openhands/agent_server/conversation_router.py Outdated
Comment thread openhands-agent-server/openhands/agent_server/conversation_router.py Outdated
@enyst

enyst commented Jun 18, 2026

Copy link
Copy Markdown
Member

@OpenHands Describe to me what does it mean to start a goal, to stop a goal, and to resume. I mean, the REST API endpoints, what do they do, in essence. Keep it concise, the high level execution path

@openhands-development

Copy link
Copy Markdown

@enyst it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again.

@openhands-ai

openhands-ai Bot commented Jun 18, 2026

Copy link
Copy Markdown

I'm on it! enyst can track my progress at all-hands.dev

enyst commented Jun 18, 2026

Copy link
Copy Markdown
Member

@enyst High-level execution path:

  • POST /conversations/{id}/goal: finds the live EventService, rejects if the conversation or another goal is already running, builds a GoalController(objective, max_iterations), then starts a background task and returns. That task emits ConversationStateUpdateEvent(key="goal", status="running"), sends the objective as a normal user message into the same conversation history, runs the agent, judges the resulting events, and either emits complete/capped or sends a continuation prompt and loops.
  • POST /conversations/{id}/goal/stop: cancels the background goal driver if present. The cancel path persists/publishes key="goal" with status="interrupted", so the UI can stop showing it as active and the goal remains resumable. It does not delete any conversation history.
  • POST /conversations/{id}/goal/resume: reads the last persisted key="goal" status from the conversation events, requires it to be resumable (not complete/capped), rebuilds the controller with the same objective and stored iteration, then starts the same background loop using a resume prompt.

A normal user message also calls stop_goal() first, so user interjection interrupts the goal before new user input is appended.

This comment was created by an AI agent (OpenHands) on behalf of the requester.

@openhands-ai

openhands-ai Bot commented Jun 18, 2026

Copy link
Copy Markdown

OpenHands encountered an error: **Failed to send message to agent server: HTTP 503 error: no available server
**

See the conversation for more information.

@openhands-development

Copy link
Copy Markdown

@enyst it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again.

Co-authored-by: openhands <openhands@all-hands.dev>
@VascoSch92

Copy link
Copy Markdown
Member Author

@OpenHands in this Pr rename every method: *_goal_conversation -> *_goal_in_conversation. Moreover, update the docstring and the comments with the description given here: #3770 (comment). Make coincise and clear docstring.

@openhands-development

Copy link
Copy Markdown

@VascoSch92 it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again.

@openhands-ai

openhands-ai Bot commented Jun 18, 2026

Copy link
Copy Markdown

I'm on it! VascoSch92 can track my progress at all-hands.dev

@VascoSch92

Copy link
Copy Markdown
Member Author

@enyst I blocked my agent. I think we are good to go if for you is good

@VascoSch92 VascoSch92 enabled auto-merge (squash) June 18, 2026 14:41

@enyst enyst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, and sorry for the confusion! My slow brain is… yeah, slow

@VascoSch92 VascoSch92 dismissed all-hands-bot’s stale review June 18, 2026 14:41

not anymore valid

@VascoSch92 VascoSch92 merged commit 1a6c10e into main Jun 18, 2026
32 of 44 checks passed
@VascoSch92 VascoSch92 deleted the vasco/goal-agent-server branch June 18, 2026 14:46
VascoSch92 added a commit that referenced this pull request Jun 18, 2026
The `_return_metrics` parameter was removed from LLM/TestLLM.completion in
2bedbd8 (removed_in 1.29.0), but tests/agent_server/test_goal_loop.py's
_GatedLLM override (added later in #3770) still declared it as a positional
param and forwarded it to super().completion(), which pyright flagged as
an incompatible override and a 5-positional call against a 4-positional
signature. Drop _return_metrics to match the current signature.
VascoSch92 added a commit that referenced this pull request Jun 18, 2026
The `_return_metrics` parameter was removed from LLM/TestLLM.completion in
2bedbd8 (removed_in 1.29.0, on this release branch), but
tests/agent_server/test_goal_loop.py's _GatedLLM override (added in #3770)
still declared it as a positional param and forwarded it to
super().completion(), which pyright flagged as an incompatible override
and a 5-positional call against the new 4-positional signature. Drop
_return_metrics to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants