feat(python): Handler wrapper on the C ABI (RFC #43)#46
Open
dzerik wants to merge 12 commits into
Open
Conversation
Per the maintainer's RFC multikernel#43 response on Q5 (let audit-only handlers opt down to Errno(EIO) or Continue), add a fourth exception policy discriminant for EIO. Python audit handlers idiomatically prefer EIO because it propagates as OSError rather than PermissionError, which is closer to what callers expect from a failed syscall. Reserves discriminant 3 (after KILL=0, DENY_EPERM=1, CONTINUE=2 to preserve ABI stability with the merged Handler C ABI).
Discriminated dataclass mirroring sandlock_action_out_t. Constructed via classmethod factories. Discriminant values match the C-side SANDLOCK_ACTION_* constants 1:1 so the trampoline (a later task) can translate directly.
Default exception policy is KILL (fail-closed) per RFC multikernel#43 Q5 = D. Subclasses can override via class attribute. ExceptionPolicy enum discriminants are stable across the C ABI (KILL=0, DENY_EPERM=1, CONTINUE=2, DENY_EIO=3).
Read-only snapshot of the seccomp notification plus an opaque mem handle for child-memory access. The read_cstr/read/write methods short-circuit to a falsy result when no mem handle is present (test contexts); the real accessors are deferred to the ctypes glue module.
ctypes bindings for the merged Handler C ABI. The trampoline dispatches via an integer-id registry lookup, relies on ctypes' implicit GIL acquisition for CFUNCTYPE callbacks, checks Py_IsInitialized() defensively, catches handler exceptions (routing to the configured on_exception policy), and translates NotifAction into setter calls. Adds Sandbox.run_with_handlers and exports Handler/NotifAction/ HandlerCtx/ExceptionPolicy from the package root. Includes the RFC multikernel#43 audit smoke test counting SYS_openat interceptions.
Documents the four cross-cutting concerns from RFC multikernel#43: GIL contention, interpreter finalization, native crashes, and Tokio runtime reentrancy. Plus ownership rules for Handler instances and injected file descriptors, and a minimal copy-pasteable example.
The trampoline handed the raw sandlock_mem_handle_t* to HandlerCtx. That C struct is a stack local in the supervisor's spawn_blocking closure — it is freed the instant the callback returns. A handler that stored its HandlerCtx and called read/read_cstr/write afterwards dereferenced a dangling pointer (use-after-free). Wrap the handle in a mutable _MemHandle cell that the trampoline invalidates in a finally block. A retained HandlerCtx now fails safe (accessors return None/False) instead of dereferencing freed memory.
T1: run_with_handlers' registration-loop rollback freed only the containers already stored in the regs array. A container created by sandlock_handler_new but not yet stored (e.g. int(syscall_nr) raised in between) leaked. Track the pending container in a local and free it in the rollback path. Also: document that handle() may run concurrently for the same instance and must not block; reject a negative srcfd in NotifAction.inject_fd_send before it reaches the C setter; bind the run name bytes to a local for lifetime clarity; pin Py_IsInitialized's restype explicitly.
The deep review found the trampoline's NotifAction kind-dispatch (errno / return_value / kill / inject_fd) had zero end-to-end test coverage — a trampoline reduced to "always Continue" passed the whole suite. Add real-child integration tests: - errno action: child observes EPERM from a denied openat - return_value action: child's getpid returns the synthetic 777 - kill action returned directly: child terminated (on_exception set to Continue to prove the kill came from the action, not a policy fallback) - mem read_cstr: handler decodes the real openat path from child memory and denies a specific file - run_with_handlers argument validation: empty list runs cleanly, a non-Handler entry raises All four behavioural tests were destructively verified — neutering the trampoline's kind-dispatch makes each one fail.
A mutation-based audit of the handler smoke suite found six tests that passed even when the production code they claimed to verify was broken: - exception KILL/CONTINUE policy tests asserted only the run's pass/fail, not that the exception path was exercised or that the child was killed AT the intercepted syscall. Now the child prints before/after markers and the raising handler records that it ran. - the two enum/C-header "match" tests asserted hardcoded numbers against hardcoded numbers. Now they parse the discriminants out of sandlock.h so an ABI drift is caught. - the RFC audit smoke test counted interpreter-startup openat noise. Now it opens a unique probe file and counts only that path. - the HandlerCtx field-exposure test exercised _for_test, not the trampoline's notification unpacking. Now a real run inspects the fields the trampoline built. Each rewrite was destructively verified — the corresponding production mutation makes the test fail.
run_with_handlers inserts each Handler into a process-global registry and relies on the C ABI's ud_drop to remove the entry on completion. sandlock_run_with_handlers is extern "C-unwind", so a panic (e.g. invoked from within an existing Tokio runtime) propagates as a Python exception before the supervisor fires ud_drop — orphaning every entry. Wrap the call in try/finally and sweep the registered ids unconditionally; _unregister_handler is idempotent, so the sweep is a no-op once ud_drop has already run. Also from a self-review pass: - mem_read now fails (returns None) on a null handle regardless of length, mirroring mem_write — a dead context yields no child-memory access. A zero-length read on a live handle stays the trivial b"". - Drop the test-only HandlerCtx._for_test classmethod; _mem_handle already defaults to None, so tests construct HandlerCtx directly. - Add two registry-hygiene tests: a completed run leaves the registry empty, and a mid-loop registration failure rolls back with no orphaned entry. The rollback test is destructively verified.
The handler smoke tests hardcoded x86_64 syscall numbers (openat=257, getpid=39). On aarch64 those are wrong: the handler registers on an unrelated syscall and never fires, and 39 lands in the aarch64 deny list, which fails the run outright. All nine kernel-dependent tests failed on the ubuntu-24.04-arm CI runner while passing on x86_64. The Rust side resolves these via libc::SYS_*; Python has no such table, so add a small arch->number map (x86_64, aarch64) and a _syscall_nr helper that skips on an unmapped arch rather than silently registering on the wrong syscall.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Python wrapper for the
HandlerC ABI merged in #44. Python users can now write seccomp-notif handlers asHandlersubclasses without touching ctypes directly.Per RFC #43 phasing, this is the second of three PRs:
C ABI surface— merged in feat(ffi): C ABI for the Handler trait (RFC #43) #44.Handlerbase class, registration viaSandbox.run_with_handlers, audit smoke test.What's in this PR
C ABI extension — one new exception-policy discriminant:
SANDLOCK_EXCEPTION_DENY_EIO = 3. Your RFC RFC: Python parity for the Handler trait (Follow-up B) #43 reply on Q5 suggested letting audit-only handlers opt down toErrno(EIO); feat(ffi): C ABI for the Handler trait (RFC #43) #44 shipped onlyDENY_EPERM. EIO is the idiomatic choice for Python audit handlers (propagates asOSErrorrather thanPermissionError). Discriminant appended afterCONTINUE=2— ABI-stable.Python module
sandlock.handler:NotifAction— frozen dataclass mirroringsandlock_action_out_t; factory classmethods (continue_,errno,return_value_,hold,kill,inject_fd_send).Handler— base class; subclass and overridehandle(ctx) -> NotifAction. Class attributeon_exceptiondefaults toKILL(fail-closed, per Q5 = D).HandlerCtx— read-only notification snapshot +read_cstr/read/writechild-memory accessors.ExceptionPolicy— IntEnum mirroringsandlock_exception_policy_t.ctypes glue (
_handler_ffi.py,_sdk.py):Handler.handle. Dispatch is via an integer-id registry lookup — no rawPyObject*crosses the FFI boundary.Py_IsInitialized()before touching Python state, catches handler exceptions (routing to the configuredon_exceptionpolicy), rejects non-NotifActionreturn values, and translates the action into setter calls onsandlock_action_out_t.HandlerCtxthat escapes itshandle()call fails safe —read/read_cstr/writereturnNone/False— rather than dereferencing the supervisor stack-local it pointed at.Sandbox.run_with_handlers(cmd, handlers, name=None)— registers(syscall_nr, Handler)pairs and runs, mirroring the existingSandbox.runmechanics.Cross-cutting concerns (RFC #43)
The RFC listed four cross-cutting items. How each is handled:
handle()dispatch holds the GIL; documented as a known limitation. Handlers should be fast and protect mutable state themselves.Py_IsInitialized()check returns an error ifPy_FinalizeExran mid-dispatch, routing the notification throughon_exception.handle()— not recoverable; documented as user responsibility.sandlock_run_with_handlersbuilds its own runtime; documented that it must not be called from within an existing Tokio runtime.All four are written up in
docs/extension-handlers.mdunder a new "Python wrapper" section, plus ownership rules forHandlerinstances and injected file descriptors.Ownership
Handlerinstances are held by the Sandbox for the run's duration; the C container'sud_dropreleases the Python reference on completion.sandlock_run_with_handlersreturns, all handler-container pointers are owned by the supervisor — the Python side does not free them. The mid-loop allocation-failure path frees only the containers created before the failure (registry-entry-removed-exactly-once invariant, test-pinned).finallyaftersandlock_run_with_handlersreturns: on the normal path the supervisor'sud_dropcalls have already emptied the slots, but the run entry point isextern "C-unwind", so a panic (e.g. invoked from within an existing Tokio runtime) propagates as a Python exception — the unconditional, idempotent sweep keeps a panic from orphaning entries.Test plan
cargo test -p sandlock-ffi— 45 tests pass (44 + 1 newDenyEiopolicy test), 0 ignored.pytest python/tests/test_handler_smoke.py— 29 tests pass:NotifActionfactories + frozen-ness,ExceptionPolicy/Handlerdefaults & override,HandlerCtxfield exposure + frozen-ness, ABI-discriminant pins, handler-registry hygiene, and the end-to-end integration tests below.HandlercountingSYS_openatinterceptions on a real child process.errno(child observesEPERMfrom a deniedopenat),return_value(child'sgetpidreturns the synthetic value),killreturned directly by a handler (child terminated;on_exceptionset to Continue so the termination provably comes from the action, not a policy fallback). Each was destructively verified — neutering the trampoline's action dispatch makes each test fail.on_exception=CONTINUElets the child complete; withon_exception=KILLit terminates the child.HandlerCtxmemory accessor — a handler decodes the realopenatpath from child memory viaread_cstrand denies a specific file.HandlerCtxretained past itshandle()call has its memory accessors fail safe; verified the liveness cell is invalidated on the trampoline'sfinallypath.run_with_handlersargument validation — empty handler list runs cleanly; a non-Handlerentry raises.syscall_nron the second entry) rolls back with no orphaned registry entry. The rollback test is destructively verified — neutering the rollbackexceptblock leaves an orphaned entry and the test fails.pytest python/tests/test_sandbox.py— no regression from the_sdk.pyadditions.DenyEiodiscriminant.Out of scope (PR 3)
ctx.read_path()).extension-handlers.md).Notes
test_sandbox.py::TestCpuThrottle::test_throttle_slows_execution, is flaky on loaded CI hosts (a CPU-time-ratio assertion). It fails identically onmainwithout this PR — unrelated to the Handler wrapper.DenyEioC-ABI discriminant into its own commit/PR if you'd prefer the Python wrapper PR to touch zero Rust.