Skip to content

feat(agent): channel-wrapping retry + token-usage metric (Wave 1)#41

Closed
urmzd wants to merge 1 commit into
mainfrom
wave-1-correctness-floor
Closed

feat(agent): channel-wrapping retry + token-usage metric (Wave 1)#41
urmzd wants to merge 1 commit into
mainfrom
wave-1-correctness-floor

Conversation

@urmzd

@urmzd urmzd commented May 31, 2026

Copy link
Copy Markdown
Owner

Wave 1 — Correctness floor

Stacked on #38 (foundation) — merge after #38.

What landed

(a) Channel-wrapping retry (agent/provider/retry)

  • retry.Provider previously only retried when ChatStream returned a synchronous error. Streaming adapters, however, deliver transient failures as an ErrorDelta on the channel (e.g. a 529 overload after message_start). Those were never retried.
  • The decorator now peeks the leading deltas: it buffers metadata-only deltas (e.g. a UsageDelta preamble) and, if an ErrorDelta arrives before any content delta, classifies it via the existing ShouldRetry/IsTransient path and re-invokes with the existing exponential backoff.
  • Once a content delta has streamed, an error is surfaced as-is (re-emitted as a terminal ErrorDelta) — a partially consumed turn is never retried. Permanent errors are surfaced immediately. Full exhaustion returns a synchronous RetryError, matching the existing synchronous-error contract.
  • Tests: TestRetryProvider_ChannelError (table-driven: transient-before-content retries then succeeds; error-after-content surfaced; permanent-before-content surfaced) and TestRetryProvider_ChannelErrorExhausted. A new scriptedStreamProvider errors mid-stream-before-content on attempt 1 then succeeds on attempt 2. All existing retry tests still pass.

(b) Token-usage metric (agent, agent/otel)

  • agent.getAssistantMessage now calls Metrics.RecordTokenUsage(ctx, "chat", provider, prompt, completion) once per completed LLM call, using the merged UsageDelta token counts. Skipped on cache hit (no new tokens) and when the provider reported no usage. It fires inside the step closure, so durable replays do not double-count.
  • agent/otel collapsed the three duplicate gen_ai.client.operation.duration histograms (operation/tool/agent) into one instrument keyed by the gen_ai.operation.name attribute, removing the duplicate-instrument registration. The token-usage instrument is unchanged.
  • Tests: agent/metrics_test.go — a recording Metrics impl asserts RecordTokenUsage fires once with the correct merged input/output counts, fires zero times when no usage is reported, and is skipped on a cache hit. agent/otel/metrics_test.go — a spy meter (built on the OTel noop instruments, no new deps) asserts the duration instrument is created exactly once and that chat/execute_tool/invoke_agent all route through it keyed by operation.name, plus that RecordTokenUsage records input+output under chat.

Verification

  • go build ./..., go vet ./agent/... ./agent/otel/ ./agent/provider/retry/, and go test ./agent/ ./agent/otel/ ./agent/provider/retry/ ./agent/types/ all pass. Backward compatible: the full existing agent suite is green.

Deferred (TODO)

  • No deferred items from the Wave 1 bounded slice — both (a) and (b) are fully implemented and tested.
  • Future: surface RecordTokenUsage for sub-agent/handoff turns is already covered transitively (each agent runs its own loop), but a dedicated end-to-end handoff token-accounting test was not added.
  • Future: an OTel SDK (sdk/metric + metricdata) integration test would assert real aggregated histogram bucket values; skipped here to avoid adding a network-fetched test dependency (the spy-meter test covers the collapse + routing contract without it).

@urmzd urmzd force-pushed the wave-1-correctness-floor branch from 525fb5a to 02d7b55 Compare May 31, 2026 20:50
Wave 1 correctness floor.

(a) retry.Provider now retries when the stream emits an ErrorDelta BEFORE
any content delta, not just when ChatStream returns a synchronous error.
Streaming adapters surface transient failures (529 overload, mid-handshake
timeouts) as a channel-delivered ErrorDelta; the decorator buffers leading
metadata deltas, classifies the error via the existing transient/ShouldRetry
path, and re-invokes with backoff. Once content has streamed, the error is
surfaced (never retry a partially consumed turn).

(b) The agent loop now calls Metrics.RecordTokenUsage once per completed LLM
call with the merged prompt/completion tokens (skipped on cache hit or when
no usage was reported). agent/otel collapses the three duplicate
gen_ai.client.operation.duration histograms into one instrument keyed by
gen_ai.operation.name.
@urmzd

urmzd commented May 31, 2026

Copy link
Copy Markdown
Owner Author

Superseded by #44, which was squash-merged into main (53a6aff) and contains every change from this branch — a dry merge of this branch into main is a no-op. main is green (build/vet/golangci-lint, 39 packages, and 8/8 live validation on gpt-4o-mini). Closing as redundant.

@urmzd urmzd closed this May 31, 2026
@urmzd urmzd deleted the wave-1-correctness-floor branch May 31, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant