Skip to content

Test old Linux kernels using Qemu#280

Merged
cmcgee1024 merged 33 commits into
swiftlang:mainfrom
cmcgee1024:qemu_linux_testing
Jun 5, 2026
Merged

Test old Linux kernels using Qemu#280
cmcgee1024 merged 33 commits into
swiftlang:mainfrom
cmcgee1024:qemu_linux_testing

Conversation

@cmcgee1024

@cmcgee1024 cmcgee1024 commented May 29, 2026

Copy link
Copy Markdown
Member

Fixes the following:

  • Increases test confidence with older Linux kernels (4.18, and 5.10)
  • Closes file handles that were leaking on Linux
  • Better soft limit detection so that testConcurrentRun() can avoid exhausting system open file limit
  • Handle the epoll(DEL) case when a process is being shut down for Linux with older kernels

@cmcgee1024 cmcgee1024 force-pushed the qemu_linux_testing branch from 02f8a3c to a9d1e70 Compare May 30, 2026 11:46
// If we can not retrieve pidfd, the system does not support waitid(P_PIDFD)
return false
}
defer { try? FileDescriptor(rawValue: selfPidfd).close() }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use closeAfter instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it would work because we would lose the errno comparison if the close of the file descriptor fails, which we want to be just a best effort here.

Thread.swift — WorkQueue.dequeue() now uses while queue.isEmpty && !isShuttingDown so spurious pthread_cond_wait wakeups (which POSIX
permits, and FreeBSD exercises frequently) no longer permanently kill the worker thread. shutdown() sets isShuttingDown = true before
signaling so the thread exits cleanly.

Subprocess+BSD.swift — source.resume() now happens before peekIfExited(). The AtomicCounter prevents double-resume in the case where
both the DispatchSource event handler and the backup peek fire for the same exit. The backup is still necessary because kqueue may not
deliver NOTE_EXIT retroactively if the process was already a zombie at registration time.
Thread.swift — WorkQueue.dequeue() now uses while queue.isEmpty && !isShuttingDown with an explicit isShuttingDown flag, so spurious
pthread_cond_wait wakeups (permitted by POSIX, common on FreeBSD) no longer permanently kill the single worker thread.

Subprocess+BSD.swift — Replaced DispatchSource NOTE_EXIT entirely with blocking waitid(WEXITED | WNOWAIT) dispatched to
DispatchQueue.global(). This closes the race at its root: the kernel holds waitid until the process exits regardless of whether it was
already a zombie at call time, the zombie is preserved for reapProcess, and GCD's thread pool handles concurrent subprocess waits without
serialisation.
Comment thread Sources/Subprocess/Thread.swift
// clone3(CLONE_PIDFD) allocates a pidfd before exec runs.
// If exec fails we retry with the next candidate path, so
// close the pidfd here to avoid leaking it across retries.
if processDescriptor > 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be pedantic, I think this should be >= 0 or != 0 because technically 0 is in the range of valid file descriptors.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense that it could be conceptually zero, so I've updated it to be >= 0.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to revert this to > 0 because it was causing some of the macOS tests to fail.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not update the tests?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the tests are verifying correct behaviour. Allowing zero has an impact to macOS.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I've added a .invalidDescriptor in b11e683 which you can use here: if processDescriptor != .invalidDescriptor. But the original way is correct too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've switched it over to use this constant and it's passing on macOS now.

Comment thread Tests/SubprocessTests/UnixTests.swift Outdated
@cmcgee1024 cmcgee1024 marked this pull request as ready for review June 4, 2026 18:30
@cmcgee1024 cmcgee1024 requested a review from iCharlesHu as a code owner June 4, 2026 18:30
// If exec fails we retry with the next candidate path, so
// close the pidfd here to avoid leaking it across retries.
if processDescriptor > 0 {
try? FileDescriptor(rawValue: processDescriptor).close()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't swallow the close error here. We can convert it to SubprocessError and throw it directly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wrapped any errors thrown here in SubprocessError in the newest revision.

Comment thread Sources/Subprocess/Configuration.swift Outdated
// process, and if Subprocess.run() then throws,
// the caller's task group cascades cancellation to
// other live processes.
try? AsyncIO.shared.cancelAsyncIO(for: processIdentifier)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make cancelAsyncIO recognize DEL failure as harmless here rather than using try?? Same as above I worry that we accidentally swallow unintended errors.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this bit of Linux-specific knowledge in the comment and the attempt to discard any errors thrown here. The epoll_ctl(DEL) case is now fully encapsulated, see comments further down for more detail.

// when processIdentifier.close() closes it, so a failed DEL here is
// never permanent. Propagating a DEL error as a monitoring failure
// would trigger onCleanup → SIGKILL against an already-dead process.
_ = epoll_ctl(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specifically ignore DEL as opposed to ignore all errors?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Linux manual the only other errors, other than the ENOENT observed during older kernel testing, would be various flavours of programming errors, such as bad file descriptor, or already closed. I've captured the rc and errno to form an assert that only gets activated in debug/test binaries so that in testing we can gather details about the programming error, and it has no effect in production.

Assert programming errors on epoll_ctl(DEL)

Rethrow file descriptor closure errors

Remove Linux assumption in Configuration leading to swallowing of cancellation errors
}

return (continuationList, nil)
assert(delRC == 0 || errno == ENOENT, "epoll_ctl(DEL) failed unexpectedly: \(errno)")

@iCharlesHu iCharlesHu Jun 5, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subprocess doesn't really use assert. We should follow the old format and resume with an error to propagate the error. If we want to only ignore ENOENT we can do something like

if delRC != 0 && errno != ENOENT {
    let error = SubprocessError.failedToMonitor(
        withUnderlyingError: Errno(rawValue: epollErrno)
    )
    return (continuationList, error)
}

return (continuationList, nil)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've adopted this approach.

// clone3(CLONE_PIDFD) allocates a pidfd before exec runs.
// If exec fails we retry with the next candidate path, so
// close the pidfd here to avoid leaking it across retries.
if processDescriptor > 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I've added a .invalidDescriptor in b11e683 which you can use here: if processDescriptor != .invalidDescriptor. But the original way is correct too.

Comment thread .swift-version
@@ -1 +1 @@
6.2.0 No newline at end of file
6.3.2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: what's our policy to update this file when new Swift is released? Do we always try to use the current release?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I end up deleting it in my working tree half the time because it causes problems when using a different version.

This is really meant for "apps" which build with one canonical version only, not libraries which build with multiple. [/rant]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In term of the policy of swift-subprocess and the swift version file, it's up to the code owners here what to do with it.

The swift version file is there to record that at least one person, and ideally the CI too, has verified that the project can be developed on a specific toolchain version with a degree of confidence that the package builds, tests will build and pass, the language server works, and things like code formatting work without extraneous diffs. The hope is to lessen "it works at my desk" kinds of problems with a measure of reproducibility, and to encourage new developers to a package because things just work right away.

By no means does this file indicate that this is the only version of the toolchain that can be used with this package, or the only one it supports. This is just the default that gets developers working with it to a known good configuration for most things and swiftly can help them with that.

Also, it has no effect on dependent packages. This is for development of this package, even if the package is a library package like this one.

It's worth noting that this file instructs what toolchain version to use for testing with the Linux kernel versions added as GH workflows in this PR as it stands. The Qemu setup script uses of swiftly to install the swift toolchain into the Linux VM that normally won't have it. It's currently configured to use this file as it's visible, easy to update, and swiftly can use it.

Please let me know if you'd prefer the test script to always use "latest", or something that's statically coded into the test script, or the GH workflow. I think that this file is simpler, more visible, and more reproducible than those alternatives.

@cmcgee1024 cmcgee1024 merged commit 6b0d119 into swiftlang:main Jun 5, 2026
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants