Log noise: agent.isProcessRunning 'indeterminate' fires every poll without escalating to terminated

### Summary

`agent.process_probe_failed` events fire every polling cycle for already-terminated sessions with the message `agent.isProcessRunning indeterminate for <session>`, without ever escalating to a definitive `terminated` state. The same evidence-line repeats indefinitely until the lifecycle-manager itself restarts.

### What I see

On `2026-06-04` two sessions (`btbo-2`, `btbo-4`) had been terminated cleanly, but their probe failures continued firing once per minute for hours:

```
2026-06-04T20:23:04.569Z  btbo-4  agent.process_probe_failed  "agent.isProcessRunning indeterminate for btbo-4"
2026-06-04T20:22:04.569Z  btbo-2  agent.process_probe_failed  "agent.isProcessRunning indeterminate for btbo-2"
2026-06-04T20:22:04.569Z  btbo-4  agent.process_probe_failed  "agent.isProcessRunning indeterminate for btbo-4"
2026-06-04T20:21:04.569Z  btbo-2  agent.process_probe_failed  "agent.isProcessRunning indeterminate for btbo-2"
2026-06-04T20:21:04.569Z  btbo-4  agent.process_probe_failed  "agent.isProcessRunning indeterminate for btbo-4"
…repeating every 60s, same evidence…
```

The runtime state stays `alive` / `process_running` indefinitely in the lifecycle-poll trace even though the AO-spawned worker for that session is gone:

```json
{
  "previousRuntimeState": "alive",
  "newRuntimeState":      "alive",
  "previousRuntimeReason": "process_running",
  "newRuntimeReason":      "process_running",
  "primaryReason": "probe_failure",
  "evidence": "idle_beyond_threshold activity_signal=valid via_native activity=idle at=2026-06-04T10:12:49.964Z"
}
```

### What I expected

After N consecutive `indeterminate` probes (e.g. 5 minutes' worth) the runtime state should escalate to `terminated` and the probe should stop firing for that session, OR the session should be silently `reconciled` and removed from the poll set the way `runtime.lost_detected` does on startup.

### Why it matters

1. **Log noise**: every running session generates one of these every minute when it terminates. Over a day a single dead session writes ~1,440 lines to the observability `ndjson`. Across a fleet of sessions that's the dominant log noise category.
2. **Confused state model**: `runtimeState = alive, runtimeReason = process_running` for a session whose actual process is gone is misleading when triaging from logs.
3. **Wasted polling**: each indeterminate probe spends compute on a probe that will never succeed.

### Workaround I'm using

After a Mac restart kills the tmux server, I run `ao start --reap-orphans --restore`. That spawns a fresh lifecycle-manager which correctly emits `runtime.lost_detected` for the stale sessions and clears them — instead of leaving them in the indeterminate loop.

### Environment

- macOS Apple Silicon, `@aoagents/ao` v0.9.4
- runtime: `tmux`, agent: `claude-code`, workspace: `worktree`
- session lifecycle managed by `lifecycle-manager` in `~/.agent-orchestrator/c3c2ee38d54f-observability/processes/`

### Suggested fix shape

In whichever file owns `agent.isProcessRunning` / the poll loop, after N consecutive `indeterminate` results for the same `sessionId`, write a `runtime.lost_detected` event for that session and skip it on subsequent polls until it's explicitly restored. The 1-minute polling interval makes 5 consecutive `indeterminate`s a sensible threshold.

Happy to send a patch if a maintainer can point me at the file that owns the probe loop and the convention for the threshold constant.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log noise: agent.isProcessRunning 'indeterminate' fires every poll without escalating to terminated #2102

Summary

What I see

What I expected

Why it matters

Workaround I'm using

Environment

Suggested fix shape

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Log noise: agent.isProcessRunning 'indeterminate' fires every poll without escalating to terminated #2102

Description

Summary

What I see

What I expected

Why it matters

Workaround I'm using

Environment

Suggested fix shape

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions