Skip to content

Sync main with master#277

Merged
wool-labs[bot] merged 5 commits into
mainfrom
master
Jul 2, 2026
Merged

Sync main with master#277
wool-labs[bot] merged 5 commits into
mainfrom
master

Conversation

@wool-labs

@wool-labs wool-labs Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Auto-generated by the sync branches workflow.

conradbzura and others added 5 commits July 2, 2026 15:41
Worker subprocesses are non-daemon spawn processes whose termination
depended entirely on the graceful stop RPC, so any teardown that never
completed it left an orphan that survived the parent and accumulated
across runs, eventually exhausting process-table and port resources.

Reap workers on every stop: LocalWorker stop now always joins the
subprocess after the graceful attempt, escalating to SIGTERM and then
SIGKILL when it lingers. The stop RPC also carries a deadline so an
unresponsive worker can no longer hang stop() and dodge the fallback.
Inside the worker, a parent-death watchdog thread ties the process to
its parent: when the parent dies, including by SIGKILL, the worker
initiates the same graceful shutdown as SIGTERM and hard-exits if the
grace window elapses.

Claude-Session: https://claude.ai/code/session_011Xw7kU5GN556rbn6sZBdzg
Releasing a resource with a positive TTL parked a task on a TTL sleep;
loops that closed before the TTL elapsed destroyed it pending, and a
task that never started emitted a coroutine-never-awaited
RuntimeWarning in the warnings summary of sub-second runs.

Arm a plain call_later timer instead and only spawn the cleanup task
once the TTL actually fires: an unfired TimerHandle is discarded
silently at loop close. Cleanup cancellation on re-acquire and clear
now cancels the timer or the in-flight task, and cleanup no longer
relies on Task internals when it runs inside its own finalize task.

Claude-Session: https://claude.ai/code/session_011Xw7kU5GN556rbn6sZBdzg
Unit tests pin the reap escalation ladder and its timeout defaulting,
the watchdog's daemon flag, stop dispatch, and hard-exit guarantees,
and the stop paths that must always reap: success, RPC failure, dead
process, and cancellation mid RPC. Two stale stop tests subsumed by
the new reap assertions are removed. Integration tests prove the
contracts on real subprocesses: a stopped worker is fully reaped
before stop returns, pool exit leaves no live workers and tolerates a
crashed one, an unresponsive worker is killed within the RPC deadline,
and a worker whose parent is SIGKILLed exits on its own.

Claude-Session: https://claude.ai/code/session_011Xw7kU5GN556rbn6sZBdzg
Pin the no-pending-work-at-loop-close regression, the in-flight
cleanup cancellation races on the pool lock for both re-acquire and
clear, and the bookkeeping invariants across arbitrary acquire and
release sequences via Hypothesis. Rewrite the TTL and cross-loop tests
for the timer design, dropping the stale asyncio.sleep scaffolding.

Claude-Session: https://claude.ai/code/session_011Xw7kU5GN556rbn6sZBdzg
@wool-labs wool-labs Bot merged commit ce7f406 into main Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant