Skip to content

Fix unbounded worker task leak from the chain-contention registry — Closes #261#270

Merged
conradbzura merged 2 commits into
wool-labs:masterfrom
conradbzura:261-fix-worker-task-leak-weak-registry
Jul 2, 2026
Merged

Fix unbounded worker task leak from the chain-contention registry — Closes #261#270
conradbzura merged 2 commits into
wool-labs:masterfrom
conradbzura:261-fix-worker-task-leak-weak-registry

Conversation

@conradbzura

@conradbzura conradbzura commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Change the task factory's module-global chain-contention registry, _task_contexts, from a strong dict[int, asyncio.Future] to a weakref.WeakValueDictionary, so a registered task's entry drops as soon as the task is garbage-collected — even when the add_done_callback(_release) cleanup never runs because the per-dispatch worker loop (ttl=0) was torn down before the callback fired. This bounds worker-side task and registry bookkeeping to in-flight dispatches instead of lifetime dispatch count, removing the dominant driver of the 0.9.3 → 0.10.0 dispatch-latency regression (an ever-growing global task set made the per-dispatch asyncio.all_tasks() teardown scan O(N-growing)). The _PENDING reservation sentinel and the chain-contention guard semantics are unchanged.

Closes #261

Proposed changes

Hold registered tasks weakly (runtime/context/factory.py)

Replace the strong _task_contexts dict with a weakref.WeakValueDictionary. Entries now drop when their task is collected regardless of whether _release fires, so a stranded done-callback (worker loop destroyed first) no longer pins the task forever. _PENDING remains a strongly-referenced module singleton, so reserved slots never evaporate; the id-reuse safety invariant still holds (a live weak entry implies its task, and thus its context, is still alive); _release remains an eager best-effort cleanup and the factory-displacement backstop.

Test cases

# Test Suite Given When Then Coverage Target
1 test_factory An armed task registered on a worker loop whose _release callback is stranded by loop teardown The last strong reference is dropped and gc.collect() runs Its weakref clears and its registry key is gone — the leak is bounded (fails on the old strong dict) Weak-registry eviction
2 test_factory A _PENDING reservation held during a re-entrant create_task gc.collect() runs inside the reservation window The re-entrant call still raises ChainContention and the original task completes _PENDING survives GC
3 test_factory A live task running under an explicit armed context The context is re-passed after a forced gc.collect() ChainContention still fires — a live task pins its own weak registry value Guard intact after GC

@conradbzura conradbzura self-assigned this Jul 2, 2026
@conradbzura conradbzura marked this pull request as ready for review July 2, 2026 15:37
@conradbzura conradbzura linked an issue Jul 2, 2026 that may be closed by this pull request
@conradbzura conradbzura force-pushed the 261-fix-worker-task-leak-weak-registry branch 2 times, most recently from 04ef139 to 60899c5 Compare July 2, 2026 22:57
The task factory records every armed task in the module-global
_task_contexts map so a context re-driven by a second task raises
wool.ChainContention. The map held its tasks strongly and evicted them
from a per-task _release done-callback. When the per-dispatch worker
loop is torn down before that callback runs the eviction is stranded,
and the entry, with the task it pins, leaks. Under sustained dispatch
the process task set grows without bound, and the per-teardown
asyncio.all_tasks scan it feeds grows with it.

Make _task_contexts a weakref.WeakValueDictionary so an entry drops as
soon as its task is collected, whether or not _release runs. The
_PENDING reservation sentinel is a strongly-referenced module
singleton, so reserved slots never evaporate, and the id-reuse safety
invariant holds because a live weak entry implies its task, and thus
its context, is still alive. _release stays an eager best-effort
eviction and the factory-displacement backstop.
Exercise the weak _task_contexts map through public observables. A
registered task whose _release callback is stranded by loop teardown is
evicted once it is unreferenced and collected, so the leak is bounded;
the assertion fails against the old strong map. The _PENDING
reservation survives a gc.collect within its window, verified through a
re-entrant wool.ChainContention raise. The contention guard still fires
for a live armed context after a forced collection, folded into the
existing shared-context test by parametrization.
@conradbzura conradbzura force-pushed the 261-fix-worker-task-leak-weak-registry branch from 60899c5 to dd440e9 Compare July 2, 2026 23:00
@conradbzura conradbzura merged commit e3f53c9 into wool-labs:master Jul 2, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix unbounded worker task leak from the chain-contention registry

1 participant