Add task progress watchdog for stuck workers#948
Conversation
📊 CI Metrics ReportSummary
By Role
Per-Test Breakdown
Trends✅ 2 test(s) improved (fewer LLM calls) Generated by HiClaw CI on 2026-06-22 16:17:07 UTC |
There was a problem hiding this comment.
Pull request overview
Adds a Manager-side “task progress watchdog” to detect stuck finite tasks by fingerprinting the latest progress-log block across heartbeat cycles, then wiring that signal into the Manager’s heartbeat guidance and task-management references.
Changes:
- Introduce
check-progress-watchdog.shto read latest task progress, compute a fingerprint, and update watchdog fields in~/state.json. - Update Manager heartbeat instructions and task-management docs to incorporate watchdog status (
normal/repeated/blocked/unknown). - Add a focused shell test covering normal, repeated, missing-progress, and blocked-progress scenarios.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test-26-progress-watchdog.sh |
New unit-style shell test for watchdog behavior and state.json updates. |
manager/agent/skills/task-management/SKILL.md |
Adds heartbeat guidance to use the watchdog; updates reference wording. |
manager/agent/skills/task-management/scripts/check-progress-watchdog.sh |
New watchdog script: extracts latest progress block, fingerprints it, updates state.json, emits JSON status. |
manager/agent/skills/task-management/references/state-management.md |
Documents watchdog script usage and the new state.json fields/statuses. |
manager/agent/HEARTBEAT.md |
Adds watchdog invocation and escalation guidance to the finite-task heartbeat flow. |
changelog/current.md |
Records the manager feature addition for the next release notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #!/bin/bash | ||
| # test-26-progress-watchdog.sh - Unit-style test for Manager task progress watchdog | ||
|
|
||
| set -euo pipefail |
There was a problem hiding this comment.
Fixed. The test no longer uses set -e; watchdog invocations are wrapped in run_watchdog so non-zero exits print an explicit failure message.
| - **Always use `manage-state.sh` to modify state.json** — never edit manually with jq. The script handles atomicity, deduplication, and initialization | ||
| - **Use the progress watchdog during heartbeat** — `check-progress-watchdog.sh` compares task progress logs across heartbeat cycles so you can distinguish normal long-running work from repeated or stale progress |
There was a problem hiding this comment.
Fixed. The gotcha now says to use task-management scripts for state writes, with manage-state.sh owning task lifecycle fields and check-progress-watchdog.sh owning watchdog snapshots.
| previous_count=$(jq -r --arg id "${TASK_ID}" '.active_tasks[] | select(.task_id == $id) | .stale_heartbeat_count // 0' "${STATE_FILE}") | ||
| now="$(_ts)" | ||
|
|
||
| if printf '%s\n' "${latest_block}" | grep -Eiq '\b(blocked|blocker)\b'; then |
There was a problem hiding this comment.
Fixed. Replaced the non-portable grep -E word-boundary pattern with an awk tokenizer that matches blocked / blocker as standalone words.
| (.active_tasks[] | select(.task_id == $id)) |= ( | ||
| (if $action == "progress_changed" or $action == "progress_blocked" then .last_progress_at = $now else . end) | ||
| | .last_progress_fingerprint = $fingerprint |
| jq --arg id "${TASK_ID}" --arg now "${now}" ' | ||
| (.active_tasks[] | select(.task_id == $id)) |= ( | ||
| .stale_heartbeat_count = ((.stale_heartbeat_count // 0) + 1) |
| --arg expected_next_update_at "${expected_next_update_at}" \ | ||
| --arg progress_changed "${progress_changed}" \ | ||
| --argjson count "${count}" ' | ||
| (.active_tasks[] | select(.task_id == $id)) |= ( |
There was a problem hiding this comment.
Fixed. The normal/repeated/blocked/long-running update path now also uses the finite selector, and the previous fingerprint/count reads are finite-scoped as well. Added duplicate task_id regression coverage for both progress-present and missing-progress paths.
ffe829c to
8bf39c2
Compare
8bf39c2 to
add0eb2
Compare
摘要
state.json中记录重复进展、缺失进展、显式阻塞和长时间运行状态。Expected next update: <UTC ISO>长任务宽限窗口,避免大型编译、测试、训练、集成任务在正常运行期间被过早判断为疑似卡住。Closes #947.
测试计划
bash tests/test-26-progress-watchdog.shbash -n manager/agent/skills/task-management/scripts/check-progress-watchdog.shbash -n tests/test-26-progress-watchdog.shgit diff --check upstream/main..HEAD