feat(agent): channel-wrapping retry + token-usage metric (Wave 1)#41
Closed
urmzd wants to merge 1 commit into
Closed
feat(agent): channel-wrapping retry + token-usage metric (Wave 1)#41urmzd wants to merge 1 commit into
urmzd wants to merge 1 commit into
Conversation
525fb5a to
02d7b55
Compare
Wave 1 correctness floor. (a) retry.Provider now retries when the stream emits an ErrorDelta BEFORE any content delta, not just when ChatStream returns a synchronous error. Streaming adapters surface transient failures (529 overload, mid-handshake timeouts) as a channel-delivered ErrorDelta; the decorator buffers leading metadata deltas, classifies the error via the existing transient/ShouldRetry path, and re-invokes with backoff. Once content has streamed, the error is surfaced (never retry a partially consumed turn). (b) The agent loop now calls Metrics.RecordTokenUsage once per completed LLM call with the merged prompt/completion tokens (skipped on cache hit or when no usage was reported). agent/otel collapses the three duplicate gen_ai.client.operation.duration histograms into one instrument keyed by gen_ai.operation.name.
02d7b55 to
a77e957
Compare
Owner
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Wave 1 — Correctness floor
Stacked on #38 (foundation) — merge after #38.
What landed
(a) Channel-wrapping retry (
agent/provider/retry)retry.Providerpreviously only retried whenChatStreamreturned a synchronous error. Streaming adapters, however, deliver transient failures as anErrorDeltaon the channel (e.g. a 529 overload aftermessage_start). Those were never retried.UsageDeltapreamble) and, if anErrorDeltaarrives before any content delta, classifies it via the existingShouldRetry/IsTransientpath and re-invokes with the existing exponential backoff.ErrorDelta) — a partially consumed turn is never retried. Permanent errors are surfaced immediately. Full exhaustion returns a synchronousRetryError, matching the existing synchronous-error contract.TestRetryProvider_ChannelError(table-driven: transient-before-content retries then succeeds; error-after-content surfaced; permanent-before-content surfaced) andTestRetryProvider_ChannelErrorExhausted. A newscriptedStreamProvidererrors mid-stream-before-content on attempt 1 then succeeds on attempt 2. All existing retry tests still pass.(b) Token-usage metric (
agent,agent/otel)agent.getAssistantMessagenow callsMetrics.RecordTokenUsage(ctx, "chat", provider, prompt, completion)once per completed LLM call, using the mergedUsageDeltatoken counts. Skipped on cache hit (no new tokens) and when the provider reported no usage. It fires inside the step closure, so durable replays do not double-count.agent/otelcollapsed the three duplicategen_ai.client.operation.durationhistograms (operation/tool/agent) into one instrument keyed by thegen_ai.operation.nameattribute, removing the duplicate-instrument registration. The token-usage instrument is unchanged.agent/metrics_test.go— a recordingMetricsimpl assertsRecordTokenUsagefires once with the correct merged input/output counts, fires zero times when no usage is reported, and is skipped on a cache hit.agent/otel/metrics_test.go— a spy meter (built on the OTel noop instruments, no new deps) asserts the duration instrument is created exactly once and thatchat/execute_tool/invoke_agentall route through it keyed byoperation.name, plus thatRecordTokenUsagerecords input+output underchat.Verification
go build ./...,go vet ./agent/... ./agent/otel/ ./agent/provider/retry/, andgo test ./agent/ ./agent/otel/ ./agent/provider/retry/ ./agent/types/all pass. Backward compatible: the full existingagentsuite is green.Deferred (TODO)
RecordTokenUsagefor sub-agent/handoff turns is already covered transitively (each agent runs its own loop), but a dedicated end-to-end handoff token-accounting test was not added.sdk/metric+metricdata) integration test would assert real aggregated histogram bucket values; skipped here to avoid adding a network-fetched test dependency (the spy-meter test covers the collapse + routing contract without it).