Sync main with master#277
Merged
Merged
Conversation
Worker subprocesses are non-daemon spawn processes whose termination depended entirely on the graceful stop RPC, so any teardown that never completed it left an orphan that survived the parent and accumulated across runs, eventually exhausting process-table and port resources. Reap workers on every stop: LocalWorker stop now always joins the subprocess after the graceful attempt, escalating to SIGTERM and then SIGKILL when it lingers. The stop RPC also carries a deadline so an unresponsive worker can no longer hang stop() and dodge the fallback. Inside the worker, a parent-death watchdog thread ties the process to its parent: when the parent dies, including by SIGKILL, the worker initiates the same graceful shutdown as SIGTERM and hard-exits if the grace window elapses. Claude-Session: https://claude.ai/code/session_011Xw7kU5GN556rbn6sZBdzg
Releasing a resource with a positive TTL parked a task on a TTL sleep; loops that closed before the TTL elapsed destroyed it pending, and a task that never started emitted a coroutine-never-awaited RuntimeWarning in the warnings summary of sub-second runs. Arm a plain call_later timer instead and only spawn the cleanup task once the TTL actually fires: an unfired TimerHandle is discarded silently at loop close. Cleanup cancellation on re-acquire and clear now cancels the timer or the in-flight task, and cleanup no longer relies on Task internals when it runs inside its own finalize task. Claude-Session: https://claude.ai/code/session_011Xw7kU5GN556rbn6sZBdzg
Unit tests pin the reap escalation ladder and its timeout defaulting, the watchdog's daemon flag, stop dispatch, and hard-exit guarantees, and the stop paths that must always reap: success, RPC failure, dead process, and cancellation mid RPC. Two stale stop tests subsumed by the new reap assertions are removed. Integration tests prove the contracts on real subprocesses: a stopped worker is fully reaped before stop returns, pool exit leaves no live workers and tolerates a crashed one, an unresponsive worker is killed within the RPC deadline, and a worker whose parent is SIGKILLed exits on its own. Claude-Session: https://claude.ai/code/session_011Xw7kU5GN556rbn6sZBdzg
Pin the no-pending-work-at-loop-close regression, the in-flight cleanup cancellation races on the pool lock for both re-acquire and clear, and the bookkeeping invariants across arbitrary acquire and release sequences via Hypothesis. Rewrite the TTL and cross-loop tests for the timer design, dropping the stale asyncio.sleep scaffolding. Claude-Session: https://claude.ai/code/session_011Xw7kU5GN556rbn6sZBdzg
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Auto-generated by the sync branches workflow.