Skip to content

ACP subprocess silently exits (code 0) when using shell wrapper with exec — cat pipe workaround found #44

@Gemini-Nick

Description

@Gemini-Nick

Problem

When using a shell wrapper script for the ACP agent (e.g. claude-acp.sh), the ACP subprocess (claude-agent-acp) silently exits with code 0 after a few minutes of idle time or during active tool_call execution. This triggers a [acp] read loop endedAll monitors stopped cascade that kills the entire weclaw process.

The wrapper script is minimal:

#!/bin/bash
unset CLAUDECODE
export CLAUDE_CODE_EXECUTABLE=/Users/me/.local/bin/claude
exec claude-agent-acp "$@"

Symptoms

  • ACP initialize handshake always succeeds
  • First dispatch sometimes works, but subsequent dispatches fail
  • After 5-15 minutes idle, dispatch immediately triggers read loop ended + context canceled
  • ACP process exits with code 0 (not killed by signal)
  • weclaw then logs All monitors stopped and shuts down entirely

Typical log pattern:

13:06:58 [acp] initialized (pid=7353) ...
13:06:58 [handler] default agent ready: claude
13:15:58 [handler] dispatching to agent (pid=7353) ...
13:15:58 [acp] read loop ended
13:15:58 [handler] agent error: session error: context canceled
13:15:58 All monitors stopped

Investigation

I tested 15 ACP subprocess instances systematically and found:

Wrapper Instances Successful replies Deaths
exec claude-agent-acp 12 1 (then died mid-session) 11
cat | exec claude-agent-acp 3 3 0

What I ruled out

  1. Not a signal kill — I wrapped the ACP process in a Node.js signal trap that monitors SIGTERM/SIGINT/SIGHUP/SIGQUIT. No signals were received before exit.

  2. Not OOM — No jetsam/memory pressure records in system log. ACP RSS was ~6MB.

  3. Not an exec syscall issue — A standalone Go test program using exec.Command() + StdinPipe() to spawn the same wrapper script works perfectly. The ACP stays alive for 15+ seconds and responds to dispatch messages.

  4. Not a claude-agent-acp bug — Testing ACP directly via stdin pipe (echo '{"jsonrpc":"2.0",...}' | claude-agent-acp) works fine. The process handles initialize + prompt correctly.

What I found

The key observation: adding cat | before exec in the wrapper script completely fixes the issue:

# Fails — ACP exits silently after minutes:
exec claude-agent-acp "$@"

# Works — ACP stays alive indefinitely:
cat | exec claude-agent-acp "$@"

The difference is the pipe topology:

Without cat:  weclaw Go pipe ──→ node (claude-agent-acp)
                                   Go closes pipe → node gets immediate EOF → exit(0)

With cat:     weclaw Go pipe ──→ cat ──→ new pipe ──→ node (claude-agent-acp)
                                  │                        │
                     Go closes pipe → cat exits → then node gets EOF
                                        (two-stage isolation)

Hypothesis

Looking at agent/acp_agent.go, the subprocess is created with:

a.cmd = exec.CommandContext(ctx, a.command, a.args...)

When the Go context is cancelled (due to monitor reconnection, idle timeout, or any internal lifecycle event), the stdin pipe write-end may be closed or the process may receive a kill signal. With direct exec, the ACP process inherits the Go-managed pipe fd, so any pipe state change from Go immediately affects it. The cat | interposes a bash-created pipe that isolates the ACP process from Go's pipe lifecycle management.

The one successful reply with exec (pid=84993) is also telling — it handled the first message (39s), started processing the second message with multiple tool_calls, then died at the 3-minute mark during active execution. This suggests the issue is time-dependent rather than message-dependent.

Workaround

For anyone hitting this: add cat | before exec in your ACP wrapper script. This has been stable in production for me with zero failures since applying the fix.

Suggestion

This may be related to the work in #40 (ACP subprocess health check). Even with health check + respawn, the root cause (stdin pipe coupling) would still cause unnecessary ACP restarts. A potential fix could be:

  • Use exec.Command() instead of exec.CommandContext() and manage subprocess lifecycle manually
  • Or ensure the stdin io.WriteCloser is held alive for the full subprocess lifetime regardless of context state

Environment

  • weclaw v0.7.1 (darwin/arm64)
  • claude-agent-acp 0.22.2 (@zed-industries/claude-agent-acp)
  • macOS Sequoia 26.3 / Apple Silicon
  • Go 1.25 (weclaw binary)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions