Skip to content

docs(sdk): codify FFI panic-safety contract at the napi/PyO3 boundary (#32)#50

Merged
ZhiXiao-Lin merged 1 commit into
mainfrom
chore/ffi-panic-safety-contract
May 29, 2026
Merged

docs(sdk): codify FFI panic-safety contract at the napi/PyO3 boundary (#32)#50
ZhiXiao-Lin merged 1 commit into
mainfrom
chore/ffi-panic-safety-contract

Conversation

@ZhiXiao-Lin
Copy link
Copy Markdown
Contributor

Follow-up #32 (FFI panic-unwind safety audit, napi 2.16 / pyo3 0.23).

Audit outcome: the boundary is panic-safe — verified

A 4-lens audit (napi semantics, PyO3 semantics, per-site classification of both ~6–7K-line lib.rs files) plus independent re-verification found zero production panic sites in an uncaught context:

  • Node sdk/node/src/lib.rs: 90 panic-prone tokens, but 89 are in #[cfg(test)] (line 5352+). The single production site is the lazy Tokio-runtime .expect() in fallback_runtime() (line 86), reached from within #[napi] bodies. tokio::spawn bodies just await an inner Result (?-propagated); ThreadsafeFunction callbacks fail closed with unwrap_or_else; no impl Drop.
  • Python sdk/python/src/lib.rs: 42 tokens, 40 in #[cfg(test)] (6567+). The single production site is the runtime .expect() in get_runtime() (line 109), caught by PyO3's pyfunction trampoline. The worker-thread with_gil bridges (PythonCallbackHandler, PyBudgetGuard, PySlashCommand — the only truly-uncaught context) use .ok()/unwrap_or_else/fail-closed; no tokio::spawn/thread::spawn/Drop.

Why docs, not a refactor

The key finding (verified against the cargo cache, not memory): napi 2.x does not wrap sync #[napi] bodies in catch_unwind by default — a panic there aborts the Node process on Rust ≥ 1.81, it is not a catchable JS error. Only #[napi] async fn (→ rejected Promise) and #[napi(catch_unwind)] are safe. So the boundary's safety rests on a convention (never panic in uncaught contexts), not structure — and a future stray .unwrap() in a TSFN callback or a new Drop impl would silently reintroduce a process-abort.

This PR codifies that contract in each SDK's module doc, where FFI edits will see it. No code change: the only two production panic sites are in framework-caught contexts and fire only on OS thread exhaustion; converting get_runtime() to a Result would thread through ~75 call sites to defend an already-caught failure — out of proportion (Rule 0).

Doc-only; both SDK crates cargo fmt --check clean.

…#32)

Audit outcome for #32 (FFI panic-unwind safety, napi 2.16 / pyo3 0.23):
the boundary is panic-safe today, but the invariant holds by *convention*,
not structure, so it can silently regress. Documents the contract in each
SDK's module doc where future FFI edits will see it.

Key finding (verified against the cargo cache, not memory): napi 2.x does
NOT wrap sync #[napi] bodies in catch_unwind by default — a panic there
aborts the Node process on Rust >= 1.81, it is not a catchable JS error.
Only #[napi] async fns (-> rejected Promise) and #[napi(catch_unwind)] are
safe; ThreadsafeFunction callbacks, spawned tasks, Drop, and module init
are not. PyO3 0.23 does catch #[pyfunction]/#[pymethods]/module-init bodies
(-> PanicException) but not worker-thread with_gil bridges or spawned tasks.

No code change: independent verification confirmed the only production
panic site in each SDK is the lazy Tokio-runtime .expect(), reached from
caught contexts; the genuinely-uncaught paths (spawned tasks, the Python
with_gil callback bridges, no Drop impls exist) are panic-free by
construction via ? / unwrap_or_else / fail-closed defaults. Converting
get_runtime() to a Result would thread through ~75 call sites to defend an
OS-exhaustion panic that is already caught — not worth it.
@ZhiXiao-Lin ZhiXiao-Lin merged commit 63c9d27 into main May 29, 2026
1 check passed
@ZhiXiao-Lin ZhiXiao-Lin deleted the chore/ffi-panic-safety-contract branch May 29, 2026 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants