Skip to content

Expose worker idle time so callers can shut down inactive workers — Closes #253#269

Draft
conradbzura wants to merge 7 commits into
wool-labs:mainfrom
conradbzura:253-expose-worker-idle-time
Draft

Expose worker idle time so callers can shut down inactive workers — Closes #253#269
conradbzura wants to merge 7 commits into
wool-labs:mainfrom
conradbzura:253-expose-worker-idle-time

Conversation

@conradbzura

Copy link
Copy Markdown
Contributor

Summary

Add an idle RPC so callers can observe how long a worker has been continuously idle and retire inactive workers on their own schedule. The worker measures idle time against a monotonic clock — seconds since its in-flight task set last became empty, with startup counting as empty, zero while any task runs, resetting whenever work resumes — and reports it through a new IdleTime wire message. WorkerConnection surfaces this as idle_time(), returning a plain float, alongside a stop() control, so a caller can poll-then-retire over the same single-worker surface.

A worker predating the RPC answers gRPC UNIMPLEMENTED, which surfaces as the new IdleUnavailable — deliberately a WoolError rather than an RpcError, so an absent capability is never mistaken for an RPC-health fault that would evict a healthy worker. The change ships mechanism only: the worker reports, the caller decides — there is no built-in auto-shutdown policy, and the routine dispatch path is untouched.

LocalWorker shutdown is refactored to route through WorkerConnection.stop() instead of a hand-built stub, and the shutdown grace period is renamed from timeout to grace so it no longer collides with the client-side deadline timeout on dispatch and idle. The old timeout keyword is kept as a deprecated alias that still works and emits a DeprecationWarning, so existing callers are not broken.

Closes #253

Proposed changes

  • Wire protocol — Add an idle unary RPC returning a new IdleTime message (a double seconds field) to the Worker service, re-exported from wool.protocol.
  • Worker idle trackingWorkerService records a monotonic timestamp of when its in-flight docket last emptied and answers the idle RPC with the elapsed seconds, or zero while busy. The timestamp flips only on the docket's empty/non-empty edges, so polling never enters the docket and cannot disturb the measurement.
  • Client control surfaceWorkerConnection gains idle_time() (returns the reported idle seconds, validating a positive deadline) and stop() (sends the stop RPC with a shutdown grace period). A missing RPC surfaces as IdleUnavailable, a new public WoolError subclass.
  • Backwards-compatible grace rename — Rename the stop() shutdown grace period from timeout to grace on Worker and WorkerLike, keeping timeout as a deprecated alias. The wire field StopRequest.timeout and the startup deadline keep the name timeout.
  • LocalWorker shutdown — Route LocalWorker._stop through a WorkerConnection, reusing its credential handling, secure-channel construction, and channel pooling, and release the pooled channel afterward.

Test cases

# Test Suite Given When Then Coverage Target
1 TestMessageConstruction An IdleTime message Constructed with and without a seconds value The seconds field holds the value and defaults to 0.0 IdleTime wire message
2 TestMessageConstruction Any finite double An IdleTime is serialized and re-parsed The seconds field survives the round-trip exactly IdleTime wire fidelity
3 TestWorkerService N in-flight tasks, each returning or raising Tasks complete one at a time and idle is polled after each Idle stays zero until the last task drains, then accrues Docket-emptiness idle invariant
4 TestWorkerService A worker with accrued idle A dispatch is rejected before tracking (backpressure, malformed id, sync callable) Idle keeps counting — the rejected task never enters the docket Idle immune to pre-tracking rejection
5 TestWorkerConnection A worker answering idle with any gRPC status code idle_time is awaited UNIMPLEMENTED maps to IdleUnavailable, transient codes to TransientRpcError, others to RpcError Idle error mapping
6 TestWorkerConnection Any non-positive timeout idle_time is awaited with it It raises ValueError without calling the stub Idle timeout validation
7 TestWorkerConnection None or any positive timeout idle_time is awaited with it It forwards the timeout to the stub as the gRPC deadline Idle deadline forwarding
8 TestWorkerConnection A worker reporting any double idle_time is awaited It returns exactly that value Idle value fidelity
9 TestWorkerConnection A worker acknowledging stop stop is awaited with a grace period It sends a StopRequest carrying the grace and returns None Stop request forwarding
10 TestWorker A started worker stop is called with the deprecated timeout keyword It emits a DeprecationWarning and forwards the value as grace Backwards-compatible stop alias
11 TestLocalWorker A started worker whose stop RPC raises stop is awaited The error propagates and the pooled channel is still released Stop channel cleanup
12 TestWorkerIdleReporting A freshly started real worker over each transport idle_time is polled twice with a wait Both readings are positive and non-decreasing Idle accrual from startup (integration)
13 TestWorkerIdleReporting A real worker with one in-flight routine idle_time is polled It reports zero while the docket is non-empty Zero-while-busy (integration)
14 TestWorkerIdleReporting A servicer without the idle RPC idle_time is polled over a real channel It raises IdleUnavailable, not RpcError Legacy-worker detection (integration)
15 TestWorkerControlSurface An idle real worker The caller polls idle, stops it, then polls again Stop terminates the worker and a later poll raises TransientRpcError Poll-then-retire flow (integration)

Add an idle unary RPC returning a new IdleTime message (a double
seconds field) to the Worker gRPC service, and re-export IdleTime from
wool.protocol alongside the other wire messages.

Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
WorkerService records a monotonic timestamp of when its in-flight task
set last became empty, with worker startup counting as the initial
empty state, and answers the idle RPC with the seconds elapsed since,
or zero while any task is in flight. The timestamp flips on the
docket's empty and non-empty edges, so polling never enters the docket
and cannot disturb the measurement it reads.

Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
Give the direct single-worker connection an idle_time poll that returns
the worker's continuous idle seconds (validating a positive deadline)
and a stop control that sends the stop RPC with a shutdown grace
period. A worker predating the idle RPC answers UNIMPLEMENTED, which
surfaces as the new IdleUnavailable; it descends from WoolError rather
than RpcError so an absent capability is not mistaken for an RPC-health
fault and does not evict a healthy worker. Export IdleUnavailable from
the wool package.

Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
Rename the stop() shutdown grace period from timeout to grace on Worker
and WorkerLike so it no longer reads as a client-side deadline and no
longer collides with the dispatch and idle timeout on the same surface.
The old timeout keyword is kept as a deprecated alias — passing it still
works and emits a DeprecationWarning — so existing callers are not
broken. The wire field StopRequest.timeout and the startup deadline
_start keep the name timeout.

Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
LocalWorker._stop now issues the stop RPC through a WorkerConnection,
reusing its credential handling, secure-channel construction, and
channel pooling instead of building a one-off stub, and releases the
pooled channel afterward.

Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
Add unit and property-based coverage for the IdleTime wire message,
WorkerService idle tracking across concurrent drains and pre-tracking
rejections, WorkerConnection.idle_time and stop (status-code mapping,
timeout validation, and value fidelity), IdleUnavailable's WoolError
lineage and public export, and the LocalWorker stop routing. Track the
grace rename across the worker suite and cover the deprecated timeout
alias on Worker.stop.

Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
Exercise idle accrual from startup, zero-while-busy, reset-on-drain,
IdleUnavailable against a servicer without the RPC, and the
poll-then-retire and LocalWorker.stop flows against real subprocess
workers across the insecure, mTLS, and one-way-TLS transports.

Claude-Session: https://claude.ai/code/session_01LEWxpdabdvMq4evG6Zfq8x
@conradbzura conradbzura self-assigned this Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose worker idle time so callers can shut down inactive workers

1 participant