Test old Linux kernels using Qemu by cmcgee1024 · Pull Request #280 · swiftlang/swift-subprocess

cmcgee1024 · 2026-05-29T19:04:39Z

Fixes the following:

Increases test confidence with older Linux kernels (4.18, and 5.10)
Closes file handles that were leaking on Linux
Better soft limit detection so that testConcurrentRun() can avoid exhausting system open file limit
Handle the epoll(DEL) case when a process is being shut down for Linux with older kernels

…e cascades during testing Throttle thresholds for constrainted qemu environments

…ing ulimit directly on Linux

jakepetroules · 2026-06-01T22:09:56Z

        // If we can not retrieve pidfd, the system does not support waitid(P_PIDFD)
        return false
    }
+    defer { try? FileDescriptor(rawValue: selfPidfd).close() }


Can you use closeAfter instead?

I don't think that it would work because we would lose the errno comparison if the close of the file descriptor fails, which we want to be just a best effort here.

Thread.swift — WorkQueue.dequeue() now uses while queue.isEmpty && !isShuttingDown so spurious pthread_cond_wait wakeups (which POSIX permits, and FreeBSD exercises frequently) no longer permanently kill the worker thread. shutdown() sets isShuttingDown = true before signaling so the thread exits cleanly. Subprocess+BSD.swift — source.resume() now happens before peekIfExited(). The AtomicCounter prevents double-resume in the case where both the DispatchSource event handler and the backup peek fire for the same exit. The backup is still necessary because kqueue may not deliver NOTE_EXIT retroactively if the process was already a zombie at registration time.

Thread.swift — WorkQueue.dequeue() now uses while queue.isEmpty && !isShuttingDown with an explicit isShuttingDown flag, so spurious pthread_cond_wait wakeups (permitted by POSIX, common on FreeBSD) no longer permanently kill the single worker thread. Subprocess+BSD.swift — Replaced DispatchSource NOTE_EXIT entirely with blocking waitid(WEXITED | WNOWAIT) dispatched to DispatchQueue.global(). This closes the race at its root: the kernel holds waitid until the process exits regardless of whether it was already a zombie at call time, the zombie is preserved for reapProcess, and GCD's thread pool handles concurrent subprocess waits without serialisation.

jakepetroules · 2026-06-03T18:21:02Z

+                        // clone3(CLONE_PIDFD) allocates a pidfd before exec runs.
+                        // If exec fails we retry with the next candidate path, so
+                        // close the pidfd here to avoid leaking it across retries.
+                        if processDescriptor > 0 {


To be pedantic, I think this should be >= 0 or != 0 because technically 0 is in the range of valid file descriptors.

Yes, this makes sense that it could be conceptually zero, so I've updated it to be >= 0.

I had to revert this to > 0 because it was causing some of the macOS tests to fail.

Why not update the tests?

It's because the tests are verifying correct behaviour. Allowing zero has an impact to macOS.

(nit) I've added a .invalidDescriptor in b11e683 which you can use here: if processDescriptor != .invalidDescriptor. But the original way is correct too.

Thanks, I've switched it over to use this constant and it's passing on macOS now.

iCharlesHu · 2026-06-03T18:16:52Z

+                        // If exec fails we retry with the next candidate path, so
+                        // close the pidfd here to avoid leaking it across retries.
+                        if processDescriptor > 0 {
+                            try? FileDescriptor(rawValue: processDescriptor).close()


We shouldn't swallow the close error here. We can convert it to SubprocessError and throw it directly.

I've wrapped any errors thrown here in SubprocessError in the newest revision.

iCharlesHu · 2026-06-03T18:19:30Z

+                            // process, and if Subprocess.run() then throws,
+                            // the caller's task group cascades cancellation to
+                            // other live processes.
+                            try? AsyncIO.shared.cancelAsyncIO(for: processIdentifier)


Should we make cancelAsyncIO recognize DEL failure as harmless here rather than using try?? Same as above I worry that we accidentally swallow unintended errors.

I've removed this bit of Linux-specific knowledge in the comment and the attempt to discard any errors thrown here. The epoll_ctl(DEL) case is now fully encapsulated, see comments further down for more detail.

iCharlesHu · 2026-06-03T18:21:32Z

+        // when processIdentifier.close() closes it, so a failed DEL here is
+        // never permanent.  Propagating a DEL error as a monitoring failure
+        // would trigger onCleanup → SIGKILL against an already-dead process.
+        _ = epoll_ctl(


Should we specifically ignore DEL as opposed to ignore all errors?

According to the Linux manual the only other errors, other than the ENOENT observed during older kernel testing, would be various flavours of programming errors, such as bad file descriptor, or already closed. I've captured the rc and errno to form an assert that only gets activated in debug/test binaries so that in testing we can gather details about the programming error, and it has no effect in production.

Assert programming errors on epoll_ctl(DEL) Rethrow file descriptor closure errors Remove Linux assumption in Configuration leading to swallowing of cancellation errors

iCharlesHu · 2026-06-05T16:18:44Z

-        }
-
-        return (continuationList, nil)
+        assert(delRC == 0 || errno == ENOENT, "epoll_ctl(DEL) failed unexpectedly: \(errno)")


Subprocess doesn't really use assert. We should follow the old format and resume with an error to propagate the error. If we want to only ignore ENOENT we can do something like

if delRC != 0 && errno != ENOENT { let error = SubprocessError.failedToMonitor( withUnderlyingError: Errno(rawValue: epollErrno) ) return (continuationList, error) } return (continuationList, nil)

Thanks, I've adopted this approach.

iCharlesHu · 2026-06-05T16:27:27Z

+                        // clone3(CLONE_PIDFD) allocates a pidfd before exec runs.
+                        // If exec fails we retry with the next candidate path, so
+                        // close the pidfd here to avoid leaking it across retries.
+                        if processDescriptor > 0 {


(nit) I've added a .invalidDescriptor in b11e683 which you can use here: if processDescriptor != .invalidDescriptor. But the original way is correct too.

iCharlesHu · 2026-06-05T16:28:38Z

@@ -1 +1 @@
-6.2.0
+6.3.2


Just curious: what's our policy to update this file when new Swift is released? Do we always try to use the current release?

Honestly, I end up deleting it in my working tree half the time because it causes problems when using a different version.

This is really meant for "apps" which build with one canonical version only, not libraries which build with multiple. [/rant]

In term of the policy of swift-subprocess and the swift version file, it's up to the code owners here what to do with it.

The swift version file is there to record that at least one person, and ideally the CI too, has verified that the project can be developed on a specific toolchain version with a degree of confidence that the package builds, tests will build and pass, the language server works, and things like code formatting work without extraneous diffs. The hope is to lessen "it works at my desk" kinds of problems with a measure of reproducibility, and to encourage new developers to a package because things just work right away.

By no means does this file indicate that this is the only version of the toolchain that can be used with this package, or the only one it supports. This is just the default that gets developers working with it to a known good configuration for most things and swiftly can help them with that.

Also, it has no effect on dependent packages. This is for development of this package, even if the package is a library package like this one.

It's worth noting that this file instructs what toolchain version to use for testing with the Linux kernel versions added as GH workflows in this PR as it stands. The Qemu setup script uses of swiftly to install the swift toolchain into the Linux VM that normally won't have it. It's currently configured to use this file as it's visible, easy to update, and swiftly can use it.

Please let me know if you'd prefer the test script to always use "latest", or something that's statically coded into the test script, or the GH workflow. I think that this file is simpler, more visible, and more reproducible than those alternatives.

Add scripts for testing older Linux distros and kernels via qemu

a9d1e70

cmcgee1024 force-pushed the qemu_linux_testing branch from 02f8a3c to a9d1e70 Compare May 30, 2026 11:46

cmcgee1024 added 17 commits May 30, 2026 08:08

Code cleanup and fix various checks

9378aba

Fix remaining license header check failure

6ac8247

Fix remaining license header check failure

05d9966

Fix tests so that they work in a slow qemu environment

c33326b

Improved ergonomics of test script for interactive debugging

33bfdfd

Unlock subprocess fork lock after failures in clone3

649cbb4

Close the pidfd once epoll no longer references it

80d682a

Revert closing pidfds, adjust concurrency limit used for tests

0943dca

Better handling for epoll errors

683b38d

Bump memory for test VM

b70141e

Gracefully handle epoll_ctl(DEL) failures, prevent single task failur…

c540777

…e cascades during testing Throttle thresholds for constrainted qemu environments

Add EMFILE and ENFILE to the ENOSYS fallback condition

6db447a

Revert change to catch EMFILE/ENFILE errors on pdfork

c3c96e6

Fix two pidfd leak cases, and calculate available concurrency by read…

e086ce2

…ing ulimit directly on Linux

Fix compile error for RLIMIT_NOFILE usage with FreeBSD

dcf84fa

Fix compile error for RLIMIT_NOFILE usage with FreeBSD

c603101

Fix compile error for RLIMIT_NOFILE usage with FreeBSD

f099368

jakepetroules reviewed Jun 1, 2026

View reviewed changes

cmcgee1024 added 5 commits June 2, 2026 07:00

Code review feedback

d4cde49

Revert converting defer close to closeAfter

899b624

Add detailed description for the continuation resume

8ff3fd4

broken-circle mentioned this pull request Jun 2, 2026

FreeBSD CI times out #287

Closed

jakepetroules reviewed Jun 3, 2026

View reviewed changes

Comment thread Sources/Subprocess/Thread.swift

cmcgee1024 added 3 commits June 3, 2026 06:23

Merge branch 'main' of https://github.com/swiftlang/swift-subprocess

cb5e9f4

Revert changes to Thread

fbfd28f

Revert vestigial BSD changes

d542ed5

jakepetroules reviewed Jun 3, 2026

View reviewed changes

Comment thread Tests/SubprocessTests/UnixTests.swift Outdated

Code review feedback

3a3dcc7

cmcgee1024 marked this pull request as ready for review June 4, 2026 18:30

cmcgee1024 requested a review from iCharlesHu as a code owner June 4, 2026 18:30

cmcgee1024 added 2 commits June 4, 2026 14:30

Try without the no-parallel option for the qemu tests

e18bbc1

Revert process descriptor check

49beeff

iCharlesHu reviewed Jun 4, 2026

View reviewed changes

cmcgee1024 added 2 commits June 5, 2026 09:31

Code review feedback

b9cb6cf

Assert programming errors on epoll_ctl(DEL) Rethrow file descriptor closure errors Remove Linux assumption in Configuration leading to swallowing of cancellation errors

Fix compile error

346c6e7

iCharlesHu approved these changes Jun 5, 2026

View reviewed changes

cmcgee1024 added 2 commits June 5, 2026 15:05

Code review feedback

2a41bd4

Fix formatting

8499acd

cmcgee1024 merged commit 6b0d119 into swiftlang:main Jun 5, 2026
46 checks passed

This was referenced Jun 6, 2026

Test Linux Kernel / al2-5.10 nondeterministically hangs #301

Open

Fix crash in fillNullTerminatedWideStringBuffer() #304

Merged

		@@ -1 +1 @@
		6.2.0 No newline at end of file
		6.3.2

Conversation

cmcgee1024 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iCharlesHu Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cmcgee1024 commented May 29, 2026 •

edited

Loading

iCharlesHu Jun 5, 2026 •

edited

Loading