Fix unbounded worker task leak from the chain-contention registry — Closes #261#270
Merged
conradbzura merged 2 commits intoJul 2, 2026
Conversation
04ef139 to
60899c5
Compare
The task factory records every armed task in the module-global _task_contexts map so a context re-driven by a second task raises wool.ChainContention. The map held its tasks strongly and evicted them from a per-task _release done-callback. When the per-dispatch worker loop is torn down before that callback runs the eviction is stranded, and the entry, with the task it pins, leaks. Under sustained dispatch the process task set grows without bound, and the per-teardown asyncio.all_tasks scan it feeds grows with it. Make _task_contexts a weakref.WeakValueDictionary so an entry drops as soon as its task is collected, whether or not _release runs. The _PENDING reservation sentinel is a strongly-referenced module singleton, so reserved slots never evaporate, and the id-reuse safety invariant holds because a live weak entry implies its task, and thus its context, is still alive. _release stays an eager best-effort eviction and the factory-displacement backstop.
Exercise the weak _task_contexts map through public observables. A registered task whose _release callback is stranded by loop teardown is evicted once it is unreferenced and collected, so the leak is bounded; the assertion fails against the old strong map. The _PENDING reservation survives a gc.collect within its window, verified through a re-entrant wool.ChainContention raise. The contention guard still fires for a live armed context after a forced collection, folded into the existing shared-context test by parametrization.
60899c5 to
dd440e9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Change the task factory's module-global chain-contention registry,
_task_contexts, from a strongdict[int, asyncio.Future]to aweakref.WeakValueDictionary, so a registered task's entry drops as soon as the task is garbage-collected — even when theadd_done_callback(_release)cleanup never runs because the per-dispatch worker loop (ttl=0) was torn down before the callback fired. This bounds worker-side task and registry bookkeeping to in-flight dispatches instead of lifetime dispatch count, removing the dominant driver of the 0.9.3 → 0.10.0 dispatch-latency regression (an ever-growing global task set made the per-dispatchasyncio.all_tasks()teardown scan O(N-growing)). The_PENDINGreservation sentinel and the chain-contention guard semantics are unchanged.Closes #261
Proposed changes
Hold registered tasks weakly (
runtime/context/factory.py)Replace the strong
_task_contextsdict with aweakref.WeakValueDictionary. Entries now drop when their task is collected regardless of whether_releasefires, so a stranded done-callback (worker loop destroyed first) no longer pins the task forever._PENDINGremains a strongly-referenced module singleton, so reserved slots never evaporate; the id-reuse safety invariant still holds (a live weak entry implies its task, and thus its context, is still alive);_releaseremains an eager best-effort cleanup and the factory-displacement backstop.Test cases
test_factory_releasecallback is stranded by loop teardowngc.collect()runsweakrefclears and its registry key is gone — the leak is bounded (fails on the old strong dict)test_factory_PENDINGreservation held during a re-entrantcreate_taskgc.collect()runs inside the reservation windowChainContentionand the original task completes_PENDINGsurvives GCtest_factorygc.collect()ChainContentionstill fires — a live task pins its own weak registry value