Problem
When using a shell wrapper script for the ACP agent (e.g. claude-acp.sh), the ACP subprocess (claude-agent-acp) silently exits with code 0 after a few minutes of idle time or during active tool_call execution. This triggers a [acp] read loop ended → All monitors stopped cascade that kills the entire weclaw process.
The wrapper script is minimal:
#!/bin/bash
unset CLAUDECODE
export CLAUDE_CODE_EXECUTABLE=/Users/me/.local/bin/claude
exec claude-agent-acp "$@"
Symptoms
- ACP initialize handshake always succeeds
- First dispatch sometimes works, but subsequent dispatches fail
- After 5-15 minutes idle, dispatch immediately triggers
read loop ended + context canceled
- ACP process exits with code 0 (not killed by signal)
- weclaw then logs
All monitors stopped and shuts down entirely
Typical log pattern:
13:06:58 [acp] initialized (pid=7353) ...
13:06:58 [handler] default agent ready: claude
13:15:58 [handler] dispatching to agent (pid=7353) ...
13:15:58 [acp] read loop ended
13:15:58 [handler] agent error: session error: context canceled
13:15:58 All monitors stopped
Investigation
I tested 15 ACP subprocess instances systematically and found:
| Wrapper |
Instances |
Successful replies |
Deaths |
exec claude-agent-acp |
12 |
1 (then died mid-session) |
11 |
cat | exec claude-agent-acp |
3 |
3 |
0 |
What I ruled out
-
Not a signal kill — I wrapped the ACP process in a Node.js signal trap that monitors SIGTERM/SIGINT/SIGHUP/SIGQUIT. No signals were received before exit.
-
Not OOM — No jetsam/memory pressure records in system log. ACP RSS was ~6MB.
-
Not an exec syscall issue — A standalone Go test program using exec.Command() + StdinPipe() to spawn the same wrapper script works perfectly. The ACP stays alive for 15+ seconds and responds to dispatch messages.
-
Not a claude-agent-acp bug — Testing ACP directly via stdin pipe (echo '{"jsonrpc":"2.0",...}' | claude-agent-acp) works fine. The process handles initialize + prompt correctly.
What I found
The key observation: adding cat | before exec in the wrapper script completely fixes the issue:
# Fails — ACP exits silently after minutes:
exec claude-agent-acp "$@"
# Works — ACP stays alive indefinitely:
cat | exec claude-agent-acp "$@"
The difference is the pipe topology:
Without cat: weclaw Go pipe ──→ node (claude-agent-acp)
Go closes pipe → node gets immediate EOF → exit(0)
With cat: weclaw Go pipe ──→ cat ──→ new pipe ──→ node (claude-agent-acp)
│ │
Go closes pipe → cat exits → then node gets EOF
(two-stage isolation)
Hypothesis
Looking at agent/acp_agent.go, the subprocess is created with:
a.cmd = exec.CommandContext(ctx, a.command, a.args...)
When the Go context is cancelled (due to monitor reconnection, idle timeout, or any internal lifecycle event), the stdin pipe write-end may be closed or the process may receive a kill signal. With direct exec, the ACP process inherits the Go-managed pipe fd, so any pipe state change from Go immediately affects it. The cat | interposes a bash-created pipe that isolates the ACP process from Go's pipe lifecycle management.
The one successful reply with exec (pid=84993) is also telling — it handled the first message (39s), started processing the second message with multiple tool_calls, then died at the 3-minute mark during active execution. This suggests the issue is time-dependent rather than message-dependent.
Workaround
For anyone hitting this: add cat | before exec in your ACP wrapper script. This has been stable in production for me with zero failures since applying the fix.
Suggestion
This may be related to the work in #40 (ACP subprocess health check). Even with health check + respawn, the root cause (stdin pipe coupling) would still cause unnecessary ACP restarts. A potential fix could be:
- Use
exec.Command() instead of exec.CommandContext() and manage subprocess lifecycle manually
- Or ensure the stdin
io.WriteCloser is held alive for the full subprocess lifetime regardless of context state
Environment
- weclaw v0.7.1 (darwin/arm64)
- claude-agent-acp 0.22.2 (@zed-industries/claude-agent-acp)
- macOS Sequoia 26.3 / Apple Silicon
- Go 1.25 (weclaw binary)
Problem
When using a shell wrapper script for the ACP agent (e.g.
claude-acp.sh), the ACP subprocess (claude-agent-acp) silently exits with code 0 after a few minutes of idle time or during active tool_call execution. This triggers a[acp] read loop ended→All monitors stoppedcascade that kills the entire weclaw process.The wrapper script is minimal:
Symptoms
read loop ended+context canceledAll monitors stoppedand shuts down entirelyTypical log pattern:
Investigation
I tested 15 ACP subprocess instances systematically and found:
exec claude-agent-acpcat | exec claude-agent-acpWhat I ruled out
Not a signal kill — I wrapped the ACP process in a Node.js signal trap that monitors SIGTERM/SIGINT/SIGHUP/SIGQUIT. No signals were received before exit.
Not OOM — No jetsam/memory pressure records in system log. ACP RSS was ~6MB.
Not an
execsyscall issue — A standalone Go test program usingexec.Command()+StdinPipe()to spawn the same wrapper script works perfectly. The ACP stays alive for 15+ seconds and responds to dispatch messages.Not a claude-agent-acp bug — Testing ACP directly via stdin pipe (
echo '{"jsonrpc":"2.0",...}' | claude-agent-acp) works fine. The process handles initialize + prompt correctly.What I found
The key observation: adding
cat |beforeexecin the wrapper script completely fixes the issue:The difference is the pipe topology:
Hypothesis
Looking at
agent/acp_agent.go, the subprocess is created with:When the Go context is cancelled (due to monitor reconnection, idle timeout, or any internal lifecycle event), the stdin pipe write-end may be closed or the process may receive a kill signal. With direct
exec, the ACP process inherits the Go-managed pipe fd, so any pipe state change from Go immediately affects it. Thecat |interposes a bash-created pipe that isolates the ACP process from Go's pipe lifecycle management.The one successful reply with
exec(pid=84993) is also telling — it handled the first message (39s), started processing the second message with multiple tool_calls, then died at the 3-minute mark during active execution. This suggests the issue is time-dependent rather than message-dependent.Workaround
For anyone hitting this: add
cat |beforeexecin your ACP wrapper script. This has been stable in production for me with zero failures since applying the fix.Suggestion
This may be related to the work in #40 (ACP subprocess health check). Even with health check + respawn, the root cause (stdin pipe coupling) would still cause unnecessary ACP restarts. A potential fix could be:
exec.Command()instead ofexec.CommandContext()and manage subprocess lifecycle manuallyio.WriteCloseris held alive for the full subprocess lifetime regardless of context stateEnvironment