fix(execd): avoid global signal to fix false command failures#1042
fix(execd): avoid global signal to fix false command failures#1042LavenderQAQ wants to merge 1 commit into
Conversation
|
Changed directories: components. 📋 Recommended labels (based on changed files):
Other available labels:
💡 Tip: Use cc @LavenderQAQ |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 970cfde872
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if err != nil && isNoChildProcessError(err) { | ||
| log.Warn("command %s: ignoring spurious ECHILD from Wait (child already reaped)", session) | ||
| err = nil | ||
| } |
There was a problem hiding this comment.
Don't convert unknown child status to success
When cmd.Wait() returns ECHILD, the child has already been reaped and its exit status is unavailable; treating that as nil makes failed commands report success. In the exact class of environments this is guarding against (for example SIGCHLD ignored/inherited or another reaper consuming the child), sh -c 'exit 42' also produces ECHILD, so this path calls markCommandFinished(session, 0, "") and OnExecuteComplete instead of surfacing the failure, corrupting command results.
Useful? React with 👍 / 👎.
|
As mentioned at the end of #1041, I need some time to observe whether this patch really resolves the issue. Once this problem no longer occurs, I will convert this PR to "ready". |
Signed-off-by: LavenderQAQ <lavenderqaq.cs@gmail.com>
970cfde to
297f2dd
Compare
Summary
Fixes #1041.
execdcould report a successfully-executed command as a failure (CommandExecError) whencmd.Wait()returned a spuriousECHILD("waitid: no child processes").The root cause was global signal handling in
runCommand/runBackgroundCommand: they calledsignal.Notify(signals)with no signal list (capturing ALL signals, including SIGCHLD and SIGURG) anddefer signal.Reset()(a process-global reset). This interfered with the Go runtime's use of SIGCHLD/SIGURG (child reaping and async preemption) and raced across concurrent/sequential commands, occasionally leavingWait()unable to reap its own child.This change:
signal.Notify(signals)+signal.Reset()withsignal.Notify(signals, forwardSignals...)+signal.Stop(signals), so only an explicit set of signals is forwarded and cleanup is scoped to this channel instead of resetting global handlers.ECHILDfromcmd.Wait()(child already reaped) so a command that ran to completion is reported as success instead of a false failure.Testing
Breaking Changes
Checklist