Skip to content

core: drain seccomp notifications via AsyncFd, remove per-sandbox blocking thread#50

Merged
congwang-mk merged 2 commits into
mainfrom
notif-async-recv
May 18, 2026
Merged

core: drain seccomp notifications via AsyncFd, remove per-sandbox blocking thread#50
congwang-mk merged 2 commits into
mainfrom
notif-async-recv

Conversation

@congwang-mk
Copy link
Copy Markdown
Contributor

Summary

Replaces the per-sandbox blocking-recv thread in the seccomp supervisor with a tokio AsyncFd registration on the notif fd. Each live sandbox is now one fewer OS thread parked in `ioctl(SECCOMP_IOCTL_NOTIF_RECV)`.

Independent of #49: this branch is rooted on `origin/main`. The two PRs touch disjoint files (FFI/Python vs. `sandlock-core`) and should merge cleanly in either order.

What changes

`seccomp/notif.rs` — supervisor rewrite:

  • Wraps `notif_fd` in `tokio::io::unix::AsyncFd` (READABLE interest).
  • On each epoll edge, drains the kernel queue via a `libc::poll(timeout=0)` probe + `recv_notif`. The probe is required because `SECCOMP_IOCTL_NOTIF_RECV` ignores `O_NONBLOCK` (kernel/seccomp.c's `wait_event_interruptible`), so we can't speculatively recv to detect an empty queue, and tokio AsyncFd is edge-triggered.
  • Treats `POLLHUP | POLLERR | POLLNVAL` without `POLLIN` as terminal — filter has been released or fd is invalid. Without this the supervisor would busy-spin: AsyncFd keeps reporting the fd ready post-child-exit, and a naive "only POLLIN matters" check loops forever.
  • Drops the `std::thread::spawn` + mpsc relay entirely.

`sandbox.rs` — supervisor startup synchronization:

Second commit is a refactor of the drain loop into a `NotifFdState` enum (`Pending` / `Empty` / `Terminal`) + `probe_notif_fd` helper. Same behavior, three-arm `match` reads in English; the POLLHUP-spin bug becomes structurally hard to reintroduce.

Why this is the right pattern

  • The seccomp notif fd implements `seccomp_notify_poll` which fires `EPOLLIN` whenever an INIT-state notification is queued, so it's a first-class epoll source.
  • `recv_notif` does not block when the queue is non-empty (`wait_event_interruptible`'s condition is checked at entry), so poll-then-recv is race-free for our single-consumer-per-filter topology.
  • A previous attempt at this pattern on the `dev` branch (commit `5638239`, later abandoned) hit the POLLHUP-spin bug; we caught it via `test_checkpoint_app_state_roundtrip` deadlock and fixed it explicitly.

Test plan

  • `cargo test --lib --tests` — 557 tests pass (skipping doctests due to an unrelated rustdoc/libLLVM issue on the dev machine).
  • `pytest python/tests/` — 247 tests pass in 15s.
  • `test_checkpoint_app_state_roundtrip` (the regression that surfaced the POLLHUP-spin bug during development) passes in 0.11s.

…hread

Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
@congwang-mk congwang-mk merged commit 414e996 into main May 18, 2026
8 checks passed
@congwang-mk congwang-mk deleted the notif-async-recv branch May 18, 2026 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant