Expose worker idle time so callers can shut down inactive workers — Closes #253#269
Draft
conradbzura wants to merge 7 commits into
Draft
Expose worker idle time so callers can shut down inactive workers — Closes #253#269conradbzura wants to merge 7 commits into
conradbzura wants to merge 7 commits into
Conversation
Add an idle unary RPC returning a new IdleTime message (a double seconds field) to the Worker gRPC service, and re-export IdleTime from wool.protocol alongside the other wire messages. Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
WorkerService records a monotonic timestamp of when its in-flight task set last became empty, with worker startup counting as the initial empty state, and answers the idle RPC with the seconds elapsed since, or zero while any task is in flight. The timestamp flips on the docket's empty and non-empty edges, so polling never enters the docket and cannot disturb the measurement it reads. Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
Give the direct single-worker connection an idle_time poll that returns the worker's continuous idle seconds (validating a positive deadline) and a stop control that sends the stop RPC with a shutdown grace period. A worker predating the idle RPC answers UNIMPLEMENTED, which surfaces as the new IdleUnavailable; it descends from WoolError rather than RpcError so an absent capability is not mistaken for an RPC-health fault and does not evict a healthy worker. Export IdleUnavailable from the wool package. Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
Rename the stop() shutdown grace period from timeout to grace on Worker and WorkerLike so it no longer reads as a client-side deadline and no longer collides with the dispatch and idle timeout on the same surface. The old timeout keyword is kept as a deprecated alias — passing it still works and emits a DeprecationWarning — so existing callers are not broken. The wire field StopRequest.timeout and the startup deadline _start keep the name timeout. Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
LocalWorker._stop now issues the stop RPC through a WorkerConnection, reusing its credential handling, secure-channel construction, and channel pooling instead of building a one-off stub, and releases the pooled channel afterward. Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
Add unit and property-based coverage for the IdleTime wire message, WorkerService idle tracking across concurrent drains and pre-tracking rejections, WorkerConnection.idle_time and stop (status-code mapping, timeout validation, and value fidelity), IdleUnavailable's WoolError lineage and public export, and the LocalWorker stop routing. Track the grace rename across the worker suite and cover the deprecated timeout alias on Worker.stop. Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
Exercise idle accrual from startup, zero-while-busy, reset-on-drain, IdleUnavailable against a servicer without the RPC, and the poll-then-retire and LocalWorker.stop flows against real subprocess workers across the insecure, mTLS, and one-way-TLS transports. Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add an
idleRPC so callers can observe how long a worker has been continuously idle and retire inactive workers on their own schedule. The worker measures idle time against a monotonic clock — seconds since its in-flight task set last became empty, with startup counting as empty, zero while any task runs, resetting whenever work resumes — and reports it through a newIdleTimewire message.WorkerConnectionsurfaces this asidle_time(), returning a plainfloat, alongside astop()control, so a caller can poll-then-retire over the same single-worker surface.A worker predating the RPC answers gRPC
UNIMPLEMENTED, which surfaces as the newIdleUnavailable— deliberately aWoolErrorrather than anRpcError, so an absent capability is never mistaken for an RPC-health fault that would evict a healthy worker. The change ships mechanism only: the worker reports, the caller decides — there is no built-in auto-shutdown policy, and the routine dispatch path is untouched.LocalWorkershutdown is refactored to route throughWorkerConnection.stop()instead of a hand-built stub, and the shutdown grace period is renamed fromtimeouttograceso it no longer collides with the client-side deadlinetimeouton dispatch and idle. The oldtimeoutkeyword is kept as a deprecated alias that still works and emits aDeprecationWarning, so existing callers are not broken.Closes #253
Proposed changes
idleunary RPC returning a newIdleTimemessage (adouble secondsfield) to the Worker service, re-exported fromwool.protocol.WorkerServicerecords a monotonic timestamp of when its in-flight docket last emptied and answers theidleRPC with the elapsed seconds, or zero while busy. The timestamp flips only on the docket's empty/non-empty edges, so polling never enters the docket and cannot disturb the measurement.WorkerConnectiongainsidle_time()(returns the reported idle seconds, validating a positive deadline) andstop()(sends the stop RPC with a shutdown grace period). A missing RPC surfaces asIdleUnavailable, a new publicWoolErrorsubclass.stop()shutdown grace period fromtimeouttograceonWorkerandWorkerLike, keepingtimeoutas a deprecated alias. The wire fieldStopRequest.timeoutand the startup deadline keep the nametimeout.LocalWorker._stopthrough aWorkerConnection, reusing its credential handling, secure-channel construction, and channel pooling, and release the pooled channel afterward.Test cases
TestMessageConstructionIdleTimemessagesecondsfield holds the value and defaults to0.0IdleTimewire messageTestMessageConstructionIdleTimeis serialized and re-parsedsecondsfield survives the round-trip exactlyIdleTimewire fidelityTestWorkerServiceTestWorkerServiceTestWorkerConnectionidle_timeis awaitedUNIMPLEMENTEDmaps toIdleUnavailable, transient codes toTransientRpcError, others toRpcErrorTestWorkerConnectionidle_timeis awaited with itValueErrorwithout calling the stubTestWorkerConnectionNoneor any positive timeoutidle_timeis awaited with itTestWorkerConnectionidle_timeis awaitedTestWorkerConnectionstopis awaited with a grace periodStopRequestcarrying the grace and returnsNoneTestWorkerstopis called with the deprecatedtimeoutkeywordDeprecationWarningand forwards the value asgraceTestLocalWorkerstopis awaitedTestWorkerIdleReportingidle_timeis polled twice with a waitTestWorkerIdleReportingidle_timeis polledTestWorkerIdleReportingidle_timeis polled over a real channelIdleUnavailable, notRpcErrorTestWorkerControlSurfaceTransientRpcError