Skip to content

fix(execd): avoid global signal to fix false command failures#1042

Draft
LavenderQAQ wants to merge 1 commit into
opensandbox-group:mainfrom
LavenderQAQ:fix/wait-child
Draft

fix(execd): avoid global signal to fix false command failures#1042
LavenderQAQ wants to merge 1 commit into
opensandbox-group:mainfrom
LavenderQAQ:fix/wait-child

Conversation

@LavenderQAQ

@LavenderQAQ LavenderQAQ commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #1041.

execd could report a successfully-executed command as a failure (CommandExecError) when cmd.Wait() returned a spurious ECHILD ("waitid: no child processes").
The root cause was global signal handling in runCommand / runBackgroundCommand: they called signal.Notify(signals) with no signal list (capturing ALL signals, including SIGCHLD and SIGURG) and defer signal.Reset() (a process-global reset). This interfered with the Go runtime's use of SIGCHLD/SIGURG (child reaping and async preemption) and raced across concurrent/sequential commands, occasionally leaving Wait() unable to reap its own child.

This change:

  1. Replaces signal.Notify(signals) + signal.Reset() with signal.Notify(signals, forwardSignals...) + signal.Stop(signals), so only an explicit set of signals is forwarded and cleanup is scoped to this channel instead of resetting global handlers.
  2. Ignores spurious ECHILD from cmd.Wait() (child already reaped) so a command that ran to completion is reported as success instead of a false failure.

Testing

  • Unit tests
  • e2e / manual verification

Breaking Changes

  • None

Checklist

  • Linked Issue or clearly described motivation
  • Added/updated docs (if needed)
  • Added/updated tests (if needed)
  • Security impact considered
  • Backward compatibility considered

@github-actions

Copy link
Copy Markdown
Contributor

⚠️ This PR has no labels. Please add one based on the changes.

Changed directories: components.

📋 Recommended labels (based on changed files):

  • component/execd ⬅️

Other available labels:

  • bug - Something isn't working
  • dependencies - Pull requests that update a dependency file
  • documentation - Improvements or additions to documentation
  • feature - New feature or request
  • packages - Changes for package, image and configuration

💡 Tip: Use feature for new functionality or improvements, bug for fixes.

cc @LavenderQAQ

@LavenderQAQ LavenderQAQ marked this pull request as draft June 12, 2026 14:31

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 970cfde872

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread components/execd/pkg/runtime/command.go Outdated
Comment on lines +230 to +233
if err != nil && isNoChildProcessError(err) {
log.Warn("command %s: ignoring spurious ECHILD from Wait (child already reaped)", session)
err = nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don't convert unknown child status to success

When cmd.Wait() returns ECHILD, the child has already been reaped and its exit status is unavailable; treating that as nil makes failed commands report success. In the exact class of environments this is guarding against (for example SIGCHLD ignored/inherited or another reaper consuming the child), sh -c 'exit 42' also produces ECHILD, so this path calls markCommandFinished(session, 0, "") and OnExecuteComplete instead of surfacing the failure, corrupting command results.

Useful? React with 👍 / 👎.

@LavenderQAQ

Copy link
Copy Markdown
Contributor Author

As mentioned at the end of #1041, I need some time to observe whether this patch really resolves the issue. Once this problem no longer occurs, I will convert this PR to "ready".

Signed-off-by: LavenderQAQ <lavenderqaq.cs@gmail.com>
@LavenderQAQ LavenderQAQ changed the title fix(execd): avoid global signal reset and ignore spurious ECHILD fix(execd): avoid global signal to fix false command failures Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

execd: global signal capture/reset causes spurious ECHILD and false command failures

1 participant