RFC: Python parity for the Handler trait (Follow-up B)
PR #36 landed the Rust Handler trait and the
Sandbox::run_with_extra_handlers(I: IntoIterator<Item = (S, H)>) shape.
The existing Python SDK (ctypes-based, in python/src/sandlock/) has no
equivalent surface — Python users can spawn a sandbox and set policy,
but cannot register user handlers.
This RFC asks for direction on five design questions before opening a
PR. The Python SDK is currently sync ctypes over libsandlock_ffi.so,
which I treat as the binding constraint (no PyO3 introduction here).
Q1: Async model
Python async def handle() semantics across the FFI boundary.
- A. Sync handler signature (
def handle(ctx) -> NotifAction).
Users that want async wrap themselves with
asyncio.run_coroutine_threadsafe(...).result().
Smaller C ABI, supervisor task blocks fully on handler.
- B. Native async handler via completion-pipe / eventfd bridge.
C ABI exposes a sandlock_completion_t* that handler signals when
ready. Idiomatic, but 3-4× more C surface and ctypes needs custom
completion glue (no PyO3-asyncio equivalent).
- C. Handler runs in isolated Python subprocess, IPC per
notification. Full isolation, no GIL contention; but ~ms-per-syscall
overhead makes high-frequency interception (VFS) impractical.
Q2: HandlerCtx FFI surface
How to expose notif/notif_fd/child-memory helpers to Python:
- A. Fully opaque pointer + getter functions
(sandlock_ctx_pid(ctx), sandlock_ctx_arg(ctx, idx),
sandlock_ctx_read_cstr(ctx, addr, buf, cap), ...).
ABI-safe to extend; per-call FFI overhead.
- B.
repr(C) struct exposed verbatim (notif_id, pid, syscall_nr,
args[6], notif_fd inline). Direct ctypes Structure mapping.
Zero-cost field access; freezes layout — kernel seccomp_notif
changes break Python ABI.
- C. Hybrid:
repr(C) notification snapshot + opaque
sandlock_mem_handle_t* for child-memory access. Notification
data direct, memory access wrapped (sandlock controls TOCTOU
lifetime).
Q3: NotifAction FFI surface
Eight Rust variants, some with owned resources (OwnedFd) and a
callback (InjectFdSendTracked.on_success).
- A. Tagged union (
enum kind + union u). Direct memory layout;
freezes union ABI. Ownership of contained fds and callback
user-data unclear.
- B. Opaque builder functions (
sandlock_action_continue(),
sandlock_action_inject_fd_send_tracked(fd, flags, cb, ud, ud_drop)).
Sandlock owns lifecycle including ud_drop cleanup callback. Heap
per action.
- C. Output-parameter setters into a sandlock-pre-allocated
sandlock_action_out_t* passed to handler.
No heap allocation; default is "no setter called → Continue";
layout still partially fixed.
Q4: Handler ownership / lifetime through FFI
When a Python handler is registered, sandlock holds an
Arc<dyn Handler> for the duration of the sandbox. Through FFI, that
means a PyObject* lives across thread/runtime boundaries.
- A. Raw
PyObject* + caller-provided Py_IncRef/Py_DecRef
callbacks. Compact API; couples sandlock_ffi to Python ABI;
GIL-acquired-before-callback contract easy to violate.
- B. Opaque
sandlock_handler_t* allocated by
sandlock_handler_new(handle_fn, ud, ud_drop). Sandlock owns
lifecycle; ud_drop is arbitrary cleanup (Py_DecRef one option).
Per-handler heap.
- C. Static-dict approach: handler registered by integer ID;
Python keeps dict[int, Handler] and dispatches via trampoline.
Minimum FFI surface; global mutable state, doesn't scale to multiple
sandboxes per process.
Q5: Error propagation — Python exception → NotifAction
If a handler raises (or Python interpreter halts mid-dispatch):
- A. Fail-open (return
Continue). Simple; handler bug becomes
silent security hole for VFS-style enforcement.
- B. Fail-closed (return
Kill). Defensive; aborts the entire
sandbox session on the first buggy notification.
- C. Configurable per-handler — registration takes
on_exception: NotifAction. Audit and VFS handlers pick different
policies. Larger registration surface.
- D. Sandbox-level default + per-handler override. Set once,
overridable; biggest API but most flexible.
Cross-cutting decisions (need a position regardless of A/B/C choice)
OwnedFd ownership rules across FFI. After a Python handler
returns an InjectFdSend{fd} action, who closes the fd on the failure
path? Proposed contract: "sandlock takes ownership; user must not
close after returning".
- GIL contention. Handler runs sync inside the supervisor task,
holding the GIL for the duration. Many concurrent notifications →
supervisor stalls. Mitigations (dedicated thread, subinterpreters)
are out of scope for v1; document as known limitation?
- Python interpreter halt during dispatch.
Py_FinalizeEx running
while sandbox alive → trampoline cannot safely call Python. Proposed:
trampoline checks Py_IsInitialized() and falls back to the
configured exception action (Q5).
- Segfaults inside Python handler. Native crash leaves supervisor
task hung, child trapped indefinitely. Proposed: not recoverable;
document as user responsibility.
Out of scope for this RFC
- CPython 3.12+ subinterpreters per sandbox.
- PyO3 / cffi alternatives (existing SDK is ctypes).
- Cross-process handler sharing.
- FFI / Python parity for
Sandbox::run / dry_run / checkpoint —
separate scope; this RFC is handler-focused.
Phasing proposal
If preferred direction emerges, suggested split:
- C ABI surface only (Q1-Q3 chosen) — new
sandlock_ffi symbols, no
Python wrapper yet. CI builds, no runtime test.
- Python wrapper layer — minimal
Handler base class + registration
into existing Sandbox.run_* Python entry points. Smoke test:
audit-only handler counting SYS_openats.
- Ergonomic layer — error mapping (Q5), context helpers
(ctx.read_path()), test fixtures, docs page.
Happy to split into 3 PRs if that's the preferred review unit.
RFC: Python parity for the Handler trait (Follow-up B)
PR #36 landed the Rust
Handlertrait and theSandbox::run_with_extra_handlers(I: IntoIterator<Item = (S, H)>)shape.The existing Python SDK (ctypes-based, in
python/src/sandlock/) has noequivalent surface — Python users can spawn a sandbox and set policy,
but cannot register user handlers.
This RFC asks for direction on five design questions before opening a
PR. The Python SDK is currently sync ctypes over
libsandlock_ffi.so,which I treat as the binding constraint (no PyO3 introduction here).
Q1: Async model
Python
async def handle()semantics across the FFI boundary.def handle(ctx) -> NotifAction).Users that want async wrap themselves with
asyncio.run_coroutine_threadsafe(...).result().Smaller C ABI, supervisor task blocks fully on handler.
C ABI exposes a
sandlock_completion_t*that handler signals whenready. Idiomatic, but 3-4× more C surface and ctypes needs custom
completion glue (no PyO3-asyncio equivalent).
notification. Full isolation, no GIL contention; but ~ms-per-syscall
overhead makes high-frequency interception (VFS) impractical.
Q2:
HandlerCtxFFI surfaceHow to expose
notif/notif_fd/child-memory helpers to Python:(
sandlock_ctx_pid(ctx),sandlock_ctx_arg(ctx, idx),sandlock_ctx_read_cstr(ctx, addr, buf, cap), ...).ABI-safe to extend; per-call FFI overhead.
repr(C)struct exposed verbatim (notif_id, pid, syscall_nr,args[6], notif_fd inline). Direct ctypes Structure mapping.
Zero-cost field access; freezes layout — kernel
seccomp_notifchanges break Python ABI.
repr(C)notification snapshot + opaquesandlock_mem_handle_t*for child-memory access. Notificationdata direct, memory access wrapped (sandlock controls TOCTOU
lifetime).
Q3:
NotifActionFFI surfaceEight Rust variants, some with owned resources (
OwnedFd) and acallback (
InjectFdSendTracked.on_success).enum kind+union u). Direct memory layout;freezes union ABI. Ownership of contained
fds and callbackuser-data unclear.
sandlock_action_continue(),sandlock_action_inject_fd_send_tracked(fd, flags, cb, ud, ud_drop)).Sandlock owns lifecycle including
ud_dropcleanup callback. Heapper action.
sandlock_action_out_t*passed to handler.No heap allocation; default is "no setter called → Continue";
layout still partially fixed.
Q4: Handler ownership / lifetime through FFI
When a Python handler is registered, sandlock holds an
Arc<dyn Handler>for the duration of the sandbox. Through FFI, thatmeans a
PyObject*lives across thread/runtime boundaries.PyObject*+ caller-providedPy_IncRef/Py_DecRefcallbacks. Compact API; couples
sandlock_ffito Python ABI;GIL-acquired-before-callback contract easy to violate.
sandlock_handler_t*allocated bysandlock_handler_new(handle_fn, ud, ud_drop). Sandlock ownslifecycle;
ud_dropis arbitrary cleanup (Py_DecRef one option).Per-handler heap.
Python keeps
dict[int, Handler]and dispatches via trampoline.Minimum FFI surface; global mutable state, doesn't scale to multiple
sandboxes per process.
Q5: Error propagation — Python exception →
NotifActionIf a handler raises (or Python interpreter halts mid-dispatch):
Continue). Simple; handler bug becomessilent security hole for VFS-style enforcement.
Kill). Defensive; aborts the entiresandbox session on the first buggy notification.
on_exception: NotifAction. Audit and VFS handlers pick differentpolicies. Larger registration surface.
overridable; biggest API but most flexible.
Cross-cutting decisions (need a position regardless of A/B/C choice)
OwnedFdownership rules across FFI. After a Python handlerreturns an
InjectFdSend{fd}action, who closes the fd on the failurepath? Proposed contract: "sandlock takes ownership; user must not
close after returning".
holding the GIL for the duration. Many concurrent notifications →
supervisor stalls. Mitigations (dedicated thread, subinterpreters)
are out of scope for v1; document as known limitation?
Py_FinalizeExrunningwhile sandbox alive → trampoline cannot safely call Python. Proposed:
trampoline checks
Py_IsInitialized()and falls back to theconfigured exception action (Q5).
task hung, child trapped indefinitely. Proposed: not recoverable;
document as user responsibility.
Out of scope for this RFC
Sandbox::run/dry_run/checkpoint—separate scope; this RFC is handler-focused.
Phasing proposal
If preferred direction emerges, suggested split:
sandlock_ffisymbols, noPython wrapper yet. CI builds, no runtime test.
Handlerbase class + registrationinto existing
Sandbox.run_*Python entry points. Smoke test:audit-only handler counting
SYS_openats.(
ctx.read_path()), test fixtures, docs page.Happy to split into 3 PRs if that's the preferred review unit.