Skip to content

Commit 71f66ed

Browse files
deimagjasdeimagjasclaude
authored
feat: structured agent monitoring with status persistence and graceful UX (#7)
* refactor: extract entrypoint functions with main() Extract linear entrypoint.sh into discrete functions (parse_args, copy_credentials, create_worktree, setup_agent_perms, run_agent, run_interactive) called from a main() entry point. Prepares for adding monitoring logic without growing an unmaintainable script. Zero behavior change — all 49 shellspec tests pass unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add agent monitoring with status.json, log persistence, and lifecycle markers Add structured monitoring to the entrypoint agent mode: - Create .agent/ dir in worktree with status.json (phase, branch, task, timestamps, exit code, commit count) - Persist full agent output to .agent/agent.log via tee - Emit [agent:status] tagged markers at lifecycle transitions (starting → working → completed/errored) - Capture post-run metrics (commit count, duration, last commit) - Add .agent/ to worktree .gitignore automatically The su-exec call no longer uses exec, allowing post-processing to run after claude exits. Test mocks updated accordingly with 6 new test cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add Makefile monitoring targets with graceful fallback Add new targets and enhance existing ones: - status-agent: reads status.json from worktree (works post-exit) - summary-agent: extracts [agent:status] markers from logs - list-agents: now shows phase from each worktree's status.json - logs-agent: falls back to .agent/agent.log when container is gone - follow-agent: same fallback behavior All fallback paths return exit 0 with contextual messages instead of silent failures, fixing the UX issue where finished agents appeared as errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add CLI status/summary commands and log fallback Add new commands to the q CLI: - q agents status --branch X: reads status.json directly from filesystem (no container needed, works post-exit) - q agents summary --branch X: shows structured lifecycle events The logs and follow commands now delegate to Makefile targets that include graceful fallback to persisted .agent/agent.log when the container no longer exists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update skill, docs, and evals for agent monitoring Update spawn-agent skill with decision flow for monitoring: status.json → agent.log → container logs. Add guidance for post-exit scenarios where container is gone. Update docs: - spawn-agent-skill.md: new monitoring commands and post-exit status - cli.md: document q agents status/summary and log fallback behavior Update evals: - list_and_monitor.md: add part 3 for status post-exit scenario - evals.json: update eval 4 to prefer status.json, add evals 7 (status post-exit) and 8 (logs fallback) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: deimagjas <deimagjas@127.0.0.1> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 200ff59 commit 71f66ed

File tree

9 files changed

+454
-72
lines changed

9 files changed

+454
-72
lines changed

.claude/skills/spawn-agent/SKILL.md

Lines changed: 38 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -168,24 +168,56 @@ ls -la "${AGENTS_HOME}" 2>/dev/null || echo "(no worktrees yet at $AGENTS_HOME)"
168168

169169
Show the user a readable table with both container status and worktree list.
170170

171-
## Reading agent output (context)
171+
## Reading agent output
172172

173-
`container logs` captures everything the agent prints (Claude's reasoning,
174-
tool calls, results). Use this to pass context back to the user:
173+
Agents persist structured monitoring data in the worktree at
174+
`${AGENTS_HOME}/<branch>/.agent/`. Use the right source for each question:
175+
176+
### Quick status (preferred for "what is agent X doing?")
177+
178+
```bash
179+
# Read status.json — works even after container exits
180+
cat "${AGENTS_HOME}/${BRANCH}/.agent/status.json" 2>/dev/null
181+
```
182+
183+
Returns JSON with phase, branch, task, timestamps, exit code, and commit count.
184+
Phases: `starting``working``completed` | `errored`.
185+
186+
### Structured lifecycle events
175187

176188
```bash
177189
CONTAINER_NAME="${PROJECT_NAME}-${CONTAINER_BRANCH}"
178190

179-
# Last 100 lines (good for summary)
180-
container logs -n 100 "${CONTAINER_NAME}"
191+
# From live container
192+
container logs "${CONTAINER_NAME}" 2>/dev/null | grep '^\[agent:'
193+
194+
# From persisted logs (after container exits)
195+
grep '^\[agent:' "${AGENTS_HOME}/${BRANCH}/.agent/agent.log" 2>/dev/null
196+
```
197+
198+
### Full logs
181199

182-
# Follow live output (for running agents)
200+
```bash
201+
# Live container (while running)
202+
container logs -n 100 "${CONTAINER_NAME}"
183203
container logs -f "${CONTAINER_NAME}"
204+
205+
# Persisted logs (after container exits)
206+
tail -100 "${AGENTS_HOME}/${BRANCH}/.agent/agent.log"
184207
```
185208

209+
### Decision flow
210+
211+
- **"What is agent X doing?"** → read `status.json` (instant, always works)
212+
- **"What did agent X do?"** → read `agent.log` (persisted, works post-exit)
213+
- **"Show me live output"**`container logs -f` (only while running)
214+
186215
Read the logs and **summarize the agent's progress** — don't just dump raw output.
187216
Tell the user: what the agent is working on, what it has done, what step it's at.
188217

218+
**Important:** When the container is gone (agent finished), do NOT attempt
219+
`container logs` — it will fail. Use the persisted files in `.agent/` instead.
220+
189221
## Integrating agent work
190222

191223
When an agent finishes, its commits already exist in the **host repo** — the

.claude/skills/spawn-agent/evals/evals.json

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,15 +44,39 @@
4444
{
4545
"id": 4,
4646
"prompt": "Check what the feat/oauth2 agent is doing right now. Give me a summary of its progress.",
47-
"expected_output": "Claude runs container logs for the feat/oauth2 agent (sanitized to feat-oauth2 in name), reads the output, and provides a plain-language summary of the agent's progress — not just a raw log dump.",
47+
"expected_output": "Claude checks status.json from the worktree for quick status, and/or reads container logs (sanitized name feat-oauth2). Provides a plain-language summary — not just a raw log dump.",
4848
"files": [],
4949
"expectations": [
50-
"Runs `container logs` (not container run or container list)",
51-
"Uses correct sanitized container name: includes feat-oauth2",
52-
"Provides a summary or interpretation of logs, not just raw output",
50+
"Reads status.json from $AGENTS_HOME/feat/oauth2/.agent/ OR runs `container logs`",
51+
"Uses correct sanitized container name if using container logs: includes feat-oauth2",
52+
"Provides a summary or interpretation, not just raw output",
5353
"Does NOT spawn a new container"
5454
]
5555
},
56+
{
57+
"id": 7,
58+
"prompt": "The feat/oauth2 agent finished a while ago. What was the result? Did it succeed?",
59+
"expected_output": "Claude reads status.json from the persisted worktree directory. Reports the phase (completed/errored), exit code, commit count, and duration. Does NOT attempt container logs on a stopped container.",
60+
"files": [],
61+
"expectations": [
62+
"Reads status.json from $AGENTS_HOME/feat/oauth2/.agent/status.json",
63+
"Reports phase, exit code, and commit count from the status file",
64+
"Does NOT attempt `container logs` on a container that no longer exists",
65+
"Does NOT show error messages about missing containers"
66+
]
67+
},
68+
{
69+
"id": 8,
70+
"prompt": "Show me the full logs from the feat/oauth2 agent. It finished already.",
71+
"expected_output": "Claude reads the persisted agent.log from the worktree directory. Shows the saved logs with a note that these are persisted logs from a finished agent.",
72+
"files": [],
73+
"expectations": [
74+
"Reads from $AGENTS_HOME/feat/oauth2/.agent/agent.log (persisted logs)",
75+
"Does NOT attempt `container logs` as the primary source for a finished agent",
76+
"Indicates these are saved/persisted logs",
77+
"Does NOT show confusing error output about missing containers"
78+
]
79+
},
5680
{
5781
"id": 5,
5882
"prompt": "The feat/oauth2 agent finished. Merge its work into my current branch.",

.claude/skills/spawn-agent/evals/list_and_monitor.md

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ User says:
1414

1515
1. Skill triggers
1616
2. Runs: `container list 2>/dev/null | grep "qubits-team"`
17-
3. Also shows worktrees on disk: `ls -la <parent>/.worktrees/`
17+
3. Also shows worktrees on disk (with status from `status.json` if available)
1818
4. Presents output in a readable format to the user
1919

2020
---
@@ -27,12 +27,29 @@ User says:
2727
## Expected behavior (monitor)
2828

2929
1. Skill triggers
30-
2. Sanitizes: `feat/jwt-auth``qubits-team-feat-jwt-auth`
31-
3. Runs: `container logs -n 100 qubits-team-feat-jwt-auth`
32-
4. **Reads and summarizes** the logs — does NOT just dump raw output
33-
5. Tells user: agent is working on X, currently at step Y, last action was Z
30+
2. Reads `status.json` from `$AGENTS_HOME/feat/jwt-auth/.agent/status.json` for quick status
31+
3. If more detail needed, reads container logs or persisted `.agent/agent.log`
32+
4. Sanitizes container name correctly: `feat/jwt-auth``qubits-team-feat-jwt-auth`
33+
5. **Reads and summarizes** the output — does NOT just dump raw logs
34+
6. Tells user: agent is working on X, currently at step Y, last action was Z
35+
36+
---
37+
38+
## Input (part 3 — status post-exit)
39+
40+
User says:
41+
> "What happened with the feat/jwt-auth agent?"
42+
43+
## Expected behavior (status post-exit)
44+
45+
1. Skill triggers
46+
2. Reads `status.json` from `$AGENTS_HOME/feat/jwt-auth/.agent/status.json`
47+
3. Reports phase (completed/errored), exit code, commit count, duration
48+
4. Does NOT attempt `container logs` on a stopped container
49+
5. If user wants full logs, reads from `.agent/agent.log` (persisted)
3450

3551
## Must NOT do
3652

3753
- Must not just print raw container logs without summarizing
3854
- Must not confuse container name sanitization (/ → -)
55+
- Must not show errors when container is gone (use persisted files instead)

app/cli/src/container_cli/commands/agents.py

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,23 @@
1+
import json
2+
import os
3+
from pathlib import Path
14
from typing import Annotated
25

36
import typer
47

5-
from container_cli.utils import check_token, run_make
8+
from container_cli.utils import check_token, find_git_root, run_make
69

710
app = typer.Typer(help="Agent lifecycle commands")
811

912

13+
def _agents_home() -> Path:
14+
"""Resolve AGENTS_HOME, falling back to sibling .worktrees/ directory."""
15+
env_val = os.environ.get("AGENTS_HOME")
16+
if env_val:
17+
return Path(env_val)
18+
return find_git_root().parent / ".worktrees"
19+
20+
1021
@app.command()
1122
def spawn(
1223
branch: Annotated[str, typer.Option("--branch", help="Git branch for the agent worktree")],
@@ -55,3 +66,26 @@ def stop(
5566
) -> None:
5667
"""Stop a branch agent container."""
5768
run_make("stop-agent", {"BRANCH": branch})
69+
70+
71+
@app.command()
72+
def status(
73+
branch: Annotated[str, typer.Option("--branch", help="Agent branch name")],
74+
) -> None:
75+
"""Show agent status from persisted status.json file."""
76+
status_file = _agents_home() / branch / ".agent" / "status.json"
77+
if not status_file.exists():
78+
typer.echo(f"[status] No status file found for branch '{branch}'.")
79+
typer.echo(f"[status] Expected at: {status_file}")
80+
raise typer.Exit(1)
81+
82+
data = json.loads(status_file.read_text())
83+
typer.echo(json.dumps(data, indent=2))
84+
85+
86+
@app.command()
87+
def summary(
88+
branch: Annotated[str, typer.Option("--branch", help="Agent branch name")],
89+
) -> None:
90+
"""Show structured lifecycle events for a branch agent."""
91+
run_make("summary-agent", {"BRANCH": branch})

config/Makefile

Lines changed: 65 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ CONTAINER_BRANCH := $(shell echo "$(BRANCH)" | tr '/_ ' '-' | tr '[:upper:]' '[:
6060
# Host env var holding the OAuth token (avoids collision with host Claude instance)
6161
HOST_TOKEN_VAR := CLAUDE_CONTAINER_OAUTH_TOKEN
6262

63-
.PHONY: build network run shell spawn list-agents logs-agent follow-agent stop-agent clean clean-network clean-all help
63+
.PHONY: build network run shell spawn list-agents logs-agent follow-agent stop-agent status-agent summary-agent clean clean-network clean-all help
6464

6565
# ── Build ─────────────────────────────────────────────────────────────────────
6666

@@ -131,13 +131,74 @@ list-agents:
131131
@container list 2>/dev/null | grep "$(PROJECT_NAME)" || echo " (none)"
132132
@echo ""
133133
@echo "[agents] Worktrees in $(WORKTREES_DIR):"
134-
@ls -la "$(WORKTREES_DIR)" 2>/dev/null || echo " (none yet)"
134+
@if [ -d "$(WORKTREES_DIR)" ]; then \
135+
for dir in $(WORKTREES_DIR)/*/; do \
136+
[ -d "$$dir" ] || continue; \
137+
branch=$$(basename "$$dir"); \
138+
status_file="$$dir/.agent/status.json"; \
139+
if [ -f "$$status_file" ] && command -v jq >/dev/null 2>&1; then \
140+
phase=$$(jq -r '.phase // "unknown"' "$$status_file" 2>/dev/null || echo "unknown"); \
141+
printf " %-30s %s\n" "$$branch" "$$phase"; \
142+
else \
143+
printf " %-30s %s\n" "$$branch" "(no status)"; \
144+
fi; \
145+
done; \
146+
else \
147+
echo " (none yet)"; \
148+
fi
135149

136150
logs-agent:
137-
container logs $(PROJECT_NAME)-$(CONTAINER_BRANCH)
151+
@container logs $(PROJECT_NAME)-$(CONTAINER_BRANCH) 2>/dev/null \
152+
|| { \
153+
LOG_FILE="$(WORKTREES_DIR)/$(BRANCH)/.agent/agent.log"; \
154+
if [ -f "$$LOG_FILE" ]; then \
155+
echo "[logs] Container $(PROJECT_NAME)-$(CONTAINER_BRANCH) no longer running (agent finished)."; \
156+
echo "[logs] Showing saved logs from $$LOG_FILE"; \
157+
echo "---"; \
158+
cat "$$LOG_FILE"; \
159+
else \
160+
echo "[logs] Container $(PROJECT_NAME)-$(CONTAINER_BRANCH) not found and no saved logs at $$LOG_FILE"; \
161+
fi; \
162+
}
138163

139164
follow-agent:
140-
container logs -f $(PROJECT_NAME)-$(CONTAINER_BRANCH)
165+
@container logs -f $(PROJECT_NAME)-$(CONTAINER_BRANCH) 2>/dev/null \
166+
|| { \
167+
LOG_FILE="$(WORKTREES_DIR)/$(BRANCH)/.agent/agent.log"; \
168+
if [ -f "$$LOG_FILE" ]; then \
169+
echo "[logs] Container $(PROJECT_NAME)-$(CONTAINER_BRANCH) no longer running (agent finished)."; \
170+
echo "[logs] Showing saved logs from $$LOG_FILE"; \
171+
echo "---"; \
172+
cat "$$LOG_FILE"; \
173+
else \
174+
echo "[logs] Container $(PROJECT_NAME)-$(CONTAINER_BRANCH) not found and no saved logs at $$LOG_FILE"; \
175+
fi; \
176+
}
177+
178+
status-agent:
179+
@STATUS_FILE="$(WORKTREES_DIR)/$(BRANCH)/.agent/status.json"; \
180+
if [ -f "$$STATUS_FILE" ]; then \
181+
if command -v jq >/dev/null 2>&1; then \
182+
jq '.' "$$STATUS_FILE"; \
183+
else \
184+
cat "$$STATUS_FILE"; \
185+
fi; \
186+
else \
187+
echo "[status] No status file found for branch '$(BRANCH)'."; \
188+
echo "[status] Expected at: $$STATUS_FILE"; \
189+
fi
190+
191+
summary-agent:
192+
@container logs $(PROJECT_NAME)-$(CONTAINER_BRANCH) 2>/dev/null \
193+
| grep '^\[agent:' \
194+
|| { \
195+
LOG_FILE="$(WORKTREES_DIR)/$(BRANCH)/.agent/agent.log"; \
196+
if [ -f "$$LOG_FILE" ]; then \
197+
grep '^\[agent:' "$$LOG_FILE" || echo "(no structured events found)"; \
198+
else \
199+
echo "(no logs available for branch '$(BRANCH)')"; \
200+
fi; \
201+
}
141202

142203
stop-agent:
143204
@container stop $(PROJECT_NAME)-$(CONTAINER_BRANCH) 2>/dev/null \

0 commit comments

Comments
 (0)