ACP subprocess silently exits (code 0) when using shell wrapper with exec — cat pipe workaround found

## Problem

When using a shell wrapper script for the ACP agent (e.g. `claude-acp.sh`), the ACP subprocess (`claude-agent-acp`) silently exits with code 0 after a few minutes of idle time or during active tool_call execution. This triggers a `[acp] read loop ended` → `All monitors stopped` cascade that kills the entire weclaw process.

The wrapper script is minimal:

```bash
#!/bin/bash
unset CLAUDECODE
export CLAUDE_CODE_EXECUTABLE=/Users/me/.local/bin/claude
exec claude-agent-acp "$@"
```

## Symptoms

- ACP initialize handshake always succeeds
- First dispatch sometimes works, but subsequent dispatches fail
- After 5-15 minutes idle, dispatch immediately triggers `read loop ended` + `context canceled`
- ACP process exits with **code 0** (not killed by signal)
- weclaw then logs `All monitors stopped` and shuts down entirely

Typical log pattern:
```
13:06:58 [acp] initialized (pid=7353) ...
13:06:58 [handler] default agent ready: claude
13:15:58 [handler] dispatching to agent (pid=7353) ...
13:15:58 [acp] read loop ended
13:15:58 [handler] agent error: session error: context canceled
13:15:58 All monitors stopped
```

## Investigation

I tested 15 ACP subprocess instances systematically and found:

| Wrapper | Instances | Successful replies | Deaths |
|---------|-----------|-------------------|--------|
| `exec claude-agent-acp` | 12 | 1 (then died mid-session) | 11 |
| `cat \| exec claude-agent-acp` | 3 | 3 | 0 |

### What I ruled out

1. **Not a signal kill** — I wrapped the ACP process in a Node.js signal trap that monitors SIGTERM/SIGINT/SIGHUP/SIGQUIT. No signals were received before exit.

2. **Not OOM** — No jetsam/memory pressure records in system log. ACP RSS was ~6MB.

3. **Not an `exec` syscall issue** — A standalone Go test program using `exec.Command()` + `StdinPipe()` to spawn the same wrapper script works perfectly. The ACP stays alive for 15+ seconds and responds to dispatch messages.

4. **Not a claude-agent-acp bug** — Testing ACP directly via stdin pipe (`echo '{"jsonrpc":"2.0",...}' | claude-agent-acp`) works fine. The process handles initialize + prompt correctly.

### What I found

The key observation: adding `cat |` before `exec` in the wrapper script completely fixes the issue:

```bash
# Fails — ACP exits silently after minutes:
exec claude-agent-acp "$@"

# Works — ACP stays alive indefinitely:
cat | exec claude-agent-acp "$@"
```

The difference is the pipe topology:

```
Without cat:  weclaw Go pipe ──→ node (claude-agent-acp)
                                   Go closes pipe → node gets immediate EOF → exit(0)

With cat:     weclaw Go pipe ──→ cat ──→ new pipe ──→ node (claude-agent-acp)
                                  │                        │
                     Go closes pipe → cat exits → then node gets EOF
                                        (two-stage isolation)
```

### Hypothesis

Looking at `agent/acp_agent.go`, the subprocess is created with:

```go
a.cmd = exec.CommandContext(ctx, a.command, a.args...)
```

When the Go context is cancelled (due to monitor reconnection, idle timeout, or any internal lifecycle event), the stdin pipe write-end may be closed or the process may receive a kill signal. With direct `exec`, the ACP process inherits the Go-managed pipe fd, so any pipe state change from Go immediately affects it. The `cat |` interposes a bash-created pipe that isolates the ACP process from Go's pipe lifecycle management.

The one successful reply with `exec` (pid=84993) is also telling — it handled the first message (39s), started processing the second message with multiple tool_calls, then died at the 3-minute mark during active execution. This suggests the issue is time-dependent rather than message-dependent.

## Workaround

For anyone hitting this: add `cat |` before `exec` in your ACP wrapper script. This has been stable in production for me with zero failures since applying the fix.

## Suggestion

This may be related to the work in #40 (ACP subprocess health check). Even with health check + respawn, the root cause (stdin pipe coupling) would still cause unnecessary ACP restarts. A potential fix could be:

- Use `exec.Command()` instead of `exec.CommandContext()` and manage subprocess lifecycle manually
- Or ensure the stdin `io.WriteCloser` is held alive for the full subprocess lifetime regardless of context state

## Environment

- weclaw v0.7.1 (darwin/arm64)
- claude-agent-acp 0.22.2 (@zed-industries/claude-agent-acp)
- macOS Sequoia 26.3 / Apple Silicon
- Go 1.25 (weclaw binary)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACP subprocess silently exits (code 0) when using shell wrapper with exec — cat pipe workaround found #44

Problem

Symptoms

Investigation

What I ruled out

What I found

Hypothesis

Workaround

Suggestion

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrapper	Instances	Successful replies	Deaths
`exec claude-agent-acp`	12	1 (then died mid-session)	11
`cat \| exec claude-agent-acp`	3	3	0

ACP subprocess silently exits (code 0) when using shell wrapper with exec — cat pipe workaround found #44

Description

Problem

Symptoms

Investigation

What I ruled out

What I found

Hypothesis

Workaround

Suggestion

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions