Skip to content

RFC: Python parity for the Handler trait (Follow-up B) #43

@dzerik

Description

@dzerik

RFC: Python parity for the Handler trait (Follow-up B)

PR #36 landed the Rust Handler trait and the
Sandbox::run_with_extra_handlers(I: IntoIterator<Item = (S, H)>) shape.
The existing Python SDK (ctypes-based, in python/src/sandlock/) has no
equivalent surface — Python users can spawn a sandbox and set policy,
but cannot register user handlers.

This RFC asks for direction on five design questions before opening a
PR. The Python SDK is currently sync ctypes over libsandlock_ffi.so,
which I treat as the binding constraint (no PyO3 introduction here).

Q1: Async model

Python async def handle() semantics across the FFI boundary.

  • A. Sync handler signature (def handle(ctx) -> NotifAction).
    Users that want async wrap themselves with
    asyncio.run_coroutine_threadsafe(...).result().
    Smaller C ABI, supervisor task blocks fully on handler.
  • B. Native async handler via completion-pipe / eventfd bridge.
    C ABI exposes a sandlock_completion_t* that handler signals when
    ready. Idiomatic, but 3-4× more C surface and ctypes needs custom
    completion glue (no PyO3-asyncio equivalent).
  • C. Handler runs in isolated Python subprocess, IPC per
    notification. Full isolation, no GIL contention; but ~ms-per-syscall
    overhead makes high-frequency interception (VFS) impractical.

Q2: HandlerCtx FFI surface

How to expose notif/notif_fd/child-memory helpers to Python:

  • A. Fully opaque pointer + getter functions
    (sandlock_ctx_pid(ctx), sandlock_ctx_arg(ctx, idx),
    sandlock_ctx_read_cstr(ctx, addr, buf, cap), ...).
    ABI-safe to extend; per-call FFI overhead.
  • B. repr(C) struct exposed verbatim (notif_id, pid, syscall_nr,
    args[6], notif_fd inline). Direct ctypes Structure mapping.
    Zero-cost field access; freezes layout — kernel seccomp_notif
    changes break Python ABI.
  • C. Hybrid: repr(C) notification snapshot + opaque
    sandlock_mem_handle_t* for child-memory access. Notification
    data direct, memory access wrapped (sandlock controls TOCTOU
    lifetime).

Q3: NotifAction FFI surface

Eight Rust variants, some with owned resources (OwnedFd) and a
callback (InjectFdSendTracked.on_success).

  • A. Tagged union (enum kind + union u). Direct memory layout;
    freezes union ABI. Ownership of contained fds and callback
    user-data unclear.
  • B. Opaque builder functions (sandlock_action_continue(),
    sandlock_action_inject_fd_send_tracked(fd, flags, cb, ud, ud_drop)).
    Sandlock owns lifecycle including ud_drop cleanup callback. Heap
    per action.
  • C. Output-parameter setters into a sandlock-pre-allocated
    sandlock_action_out_t* passed to handler.
    No heap allocation; default is "no setter called → Continue";
    layout still partially fixed.

Q4: Handler ownership / lifetime through FFI

When a Python handler is registered, sandlock holds an
Arc<dyn Handler> for the duration of the sandbox. Through FFI, that
means a PyObject* lives across thread/runtime boundaries.

  • A. Raw PyObject* + caller-provided Py_IncRef/Py_DecRef
    callbacks. Compact API; couples sandlock_ffi to Python ABI;
    GIL-acquired-before-callback contract easy to violate.
  • B. Opaque sandlock_handler_t* allocated by
    sandlock_handler_new(handle_fn, ud, ud_drop). Sandlock owns
    lifecycle; ud_drop is arbitrary cleanup (Py_DecRef one option).
    Per-handler heap.
  • C. Static-dict approach: handler registered by integer ID;
    Python keeps dict[int, Handler] and dispatches via trampoline.
    Minimum FFI surface; global mutable state, doesn't scale to multiple
    sandboxes per process.

Q5: Error propagation — Python exception → NotifAction

If a handler raises (or Python interpreter halts mid-dispatch):

  • A. Fail-open (return Continue). Simple; handler bug becomes
    silent security hole for VFS-style enforcement.
  • B. Fail-closed (return Kill). Defensive; aborts the entire
    sandbox session on the first buggy notification.
  • C. Configurable per-handler — registration takes
    on_exception: NotifAction. Audit and VFS handlers pick different
    policies. Larger registration surface.
  • D. Sandbox-level default + per-handler override. Set once,
    overridable; biggest API but most flexible.

Cross-cutting decisions (need a position regardless of A/B/C choice)

  • OwnedFd ownership rules across FFI. After a Python handler
    returns an InjectFdSend{fd} action, who closes the fd on the failure
    path? Proposed contract: "sandlock takes ownership; user must not
    close after returning".
  • GIL contention. Handler runs sync inside the supervisor task,
    holding the GIL for the duration. Many concurrent notifications →
    supervisor stalls. Mitigations (dedicated thread, subinterpreters)
    are out of scope for v1; document as known limitation?
  • Python interpreter halt during dispatch. Py_FinalizeEx running
    while sandbox alive → trampoline cannot safely call Python. Proposed:
    trampoline checks Py_IsInitialized() and falls back to the
    configured exception action (Q5).
  • Segfaults inside Python handler. Native crash leaves supervisor
    task hung, child trapped indefinitely. Proposed: not recoverable;
    document as user responsibility.

Out of scope for this RFC

  • CPython 3.12+ subinterpreters per sandbox.
  • PyO3 / cffi alternatives (existing SDK is ctypes).
  • Cross-process handler sharing.
  • FFI / Python parity for Sandbox::run / dry_run / checkpoint
    separate scope; this RFC is handler-focused.

Phasing proposal

If preferred direction emerges, suggested split:

  1. C ABI surface only (Q1-Q3 chosen) — new sandlock_ffi symbols, no
    Python wrapper yet. CI builds, no runtime test.
  2. Python wrapper layer — minimal Handler base class + registration
    into existing Sandbox.run_* Python entry points. Smoke test:
    audit-only handler counting SYS_openats.
  3. Ergonomic layer — error mapping (Q5), context helpers
    (ctx.read_path()), test fixtures, docs page.

Happy to split into 3 PRs if that's the preferred review unit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions