Skip to content

sandlock_spawn fails with ENOSYS (clone3) when called from a multi-threaded Python process (uvicorn/asyncio + Kubernetes RuntimeDefault seccomp) #47

@mrsimpson

Description

@mrsimpson

Summary

Calling Sandbox(policy).run(...) from a uvicorn server process returns exit_code=-1, error=\"sandlock_spawn failed\" every time. The identical call succeeds from a fresh single-threaded Python process in the same container.

Context

I was setting up sandlock as the execution backend for an MCP tool server — following the recommendation in lobehub/lobehub#12472 to use sandlock as a self-hosted alternative to LobeHub's cloud sandbox. Because LobeHub requires Streamable HTTP MCP transport (not SSE), I wrote a thin FastMCP wrapper around Sandbox.run().

The server runs as a sidecar container in a Kubernetes k3s pod.

Environment

  • Python 3.12, sandlock 0.7.0 (pip)
  • uvicorn + FastMCP (Streamable HTTP transport)
  • Kubernetes k3s, kernel 6.18.18, Landlock ABI v7
  • Pod seccomp: RuntimeDefault (Kubernetes PSS restricted)
  • Container: UID 1000, readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, capabilities: drop ALL

Reproduction

Any FastMCP/uvicorn server that calls Sandbox(policy).run() from its request handler:

@mcp.tool()
async def execute_python(code: str) -> str:
    ws = pathlib.Path("/tmp/sessions/default")
    policy = Policy(fs_readable=["/usr","/lib","/etc"], fs_writable=[str(ws)], ...)
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, lambda: Sandbox(policy).run(["python3", "-c", code]))

Result: Result(success=False, exit_code=-1, error='sandlock_spawn failed')

Diagnosis

I am not a kernel developer or Python internals expert — I figured this out in collaboration with Claude Sonnet 4.6, so please correct any mistakes in the analysis.

A diagnostic endpoint injected into the running server process revealed:

{
  "pid": 1,
  "active_threads": 2,
  "fork": "ok",
  "clone3": "ret=-1 errno=38 (Function not implemented)",
  "new_thread": "ok",
  "minimal_policy": {"ok": false, "error": "sandlock_spawn failed"}
}

Key observations:

  • fork() works fine from the server process
  • clone3 returns ENOSYS — it is blocked by Kubernetes' RuntimeDefault seccomp profile
  • Python's threading.Thread still works because glibc falls back from clone3 to clone
  • sandlock_spawn fails even with the most minimal policy

Reading crates/sandlock-ffi/src/lib.rs:

let rt = match tokio::runtime::Runtime::new() {   // = new_multi_thread()
    Ok(rt) => rt,
    Err(_) => return ptr::null_mut(),             // → "sandlock_spawn failed"
};

Runtime::new() calls new_multi_thread(), which spawns OS worker threads. Our hypothesis: when called from a multi-threaded parent process (uvicorn has 2 threads — event loop + thread pool), Tokio's worker thread spawning fails. Either clone3 is blocked and the fallback doesn't work reliably in a multi-threaded context, or glibc's pthread_atfork handlers deadlock in the forked child. Python itself warns:

DeprecationWarning: This process (pid=1) is multi-threaded,
use of fork() may lead to deadlocks in the child.

The same issue exists in the current source at lib.rs lines ~694, ~744, ~890, ~1042, ~1224, ~1330, ~1628, ~1679, ~1710 and handler/run.rs.

Workaround

Spawn a fresh single-threaded Python subprocess per sandlock call. The subprocess has no active event loop or thread pool, so Tokio's runtime creation succeeds:

def _run_sandboxed_sync(cmd, ws, timeout):
    helper = r"""
import sys, json, pathlib
from sandlock import Sandbox, Policy
req = json.loads(sys.stdin.read())
# build policy, call Sandbox(policy).run(), return JSON
"""
    proc = subprocess.run(
        [sys.executable, "-c", helper],
        input=json.dumps({"cmd": cmd, "ws": str(ws), "timeout": timeout}),
        capture_output=True, text=True, timeout=timeout + 5,
    )
    return json.loads(proc.stdout)["output"]

This works, but adds ~50ms overhead (Python startup time) and an extra unconfined intermediary process.

Suggested fix

Replace Runtime::new() with a current-thread runtime at every call site in the FFI layer:

// Before
let rt = match tokio::runtime::Runtime::new() {

// After
let rt = match tokio::runtime::Builder::new_current_thread()
    .enable_all()
    .build() {

A current-thread runtime runs entirely on the calling thread — no worker thread spawning, no clone3, no fork-safety issues. The async operations sandlock performs (waiting for child process I/O) are I/O-bound, not CPU-parallel, so there is no functional regression from dropping the multi-thread scheduler.


Happy to provide any additional diagnostic information or test a patched build.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions