feat(agent): channel-wrapping retry + token-usage metric (Wave 1) by urmzd · Pull Request #41 · urmzd/saige

urmzd · 2026-05-31T20:18:13Z

Wave 1 — Correctness floor

Stacked on #38 (foundation) — merge after #38.

What landed

(a) Channel-wrapping retry (agent/provider/retry)

retry.Provider previously only retried when ChatStream returned a synchronous error. Streaming adapters, however, deliver transient failures as an ErrorDelta on the channel (e.g. a 529 overload after message_start). Those were never retried.
The decorator now peeks the leading deltas: it buffers metadata-only deltas (e.g. a UsageDelta preamble) and, if an ErrorDelta arrives before any content delta, classifies it via the existing ShouldRetry/IsTransient path and re-invokes with the existing exponential backoff.
Once a content delta has streamed, an error is surfaced as-is (re-emitted as a terminal ErrorDelta) — a partially consumed turn is never retried. Permanent errors are surfaced immediately. Full exhaustion returns a synchronous RetryError, matching the existing synchronous-error contract.
Tests: TestRetryProvider_ChannelError (table-driven: transient-before-content retries then succeeds; error-after-content surfaced; permanent-before-content surfaced) and TestRetryProvider_ChannelErrorExhausted. A new scriptedStreamProvider errors mid-stream-before-content on attempt 1 then succeeds on attempt 2. All existing retry tests still pass.

(b) Token-usage metric (agent, agent/otel)

agent.getAssistantMessage now calls Metrics.RecordTokenUsage(ctx, "chat", provider, prompt, completion) once per completed LLM call, using the merged UsageDelta token counts. Skipped on cache hit (no new tokens) and when the provider reported no usage. It fires inside the step closure, so durable replays do not double-count.
agent/otel collapsed the three duplicate gen_ai.client.operation.duration histograms (operation/tool/agent) into one instrument keyed by the gen_ai.operation.name attribute, removing the duplicate-instrument registration. The token-usage instrument is unchanged.
Tests: agent/metrics_test.go — a recording Metrics impl asserts RecordTokenUsage fires once with the correct merged input/output counts, fires zero times when no usage is reported, and is skipped on a cache hit. agent/otel/metrics_test.go — a spy meter (built on the OTel noop instruments, no new deps) asserts the duration instrument is created exactly once and that chat/execute_tool/invoke_agent all route through it keyed by operation.name, plus that RecordTokenUsage records input+output under chat.

Verification

go build ./..., go vet ./agent/... ./agent/otel/ ./agent/provider/retry/, and go test ./agent/ ./agent/otel/ ./agent/provider/retry/ ./agent/types/ all pass. Backward compatible: the full existing agent suite is green.

Deferred (TODO)

No deferred items from the Wave 1 bounded slice — both (a) and (b) are fully implemented and tested.
Future: surface RecordTokenUsage for sub-agent/handoff turns is already covered transitively (each agent runs its own loop), but a dedicated end-to-end handoff token-accounting test was not added.
Future: an OTel SDK (sdk/metric + metricdata) integration test would assert real aggregated histogram bucket values; skipped here to avoid adding a network-fetched test dependency (the spy-meter test covers the collapse + routing contract without it).

Wave 1 correctness floor. (a) retry.Provider now retries when the stream emits an ErrorDelta BEFORE any content delta, not just when ChatStream returns a synchronous error. Streaming adapters surface transient failures (529 overload, mid-handshake timeouts) as a channel-delivered ErrorDelta; the decorator buffers leading metadata deltas, classifies the error via the existing transient/ShouldRetry path, and re-invokes with backoff. Once content has streamed, the error is surfaced (never retry a partially consumed turn). (b) The agent loop now calls Metrics.RecordTokenUsage once per completed LLM call with the merged prompt/completion tokens (skipped on cache hit or when no usage was reported). agent/otel collapses the three duplicate gen_ai.client.operation.duration histograms into one instrument keyed by gen_ai.operation.name.

urmzd · 2026-05-31T21:58:13Z

Superseded by #44, which was squash-merged into main (53a6aff) and contains every change from this branch — a dry merge of this branch into main is a no-op. main is green (build/vet/golangci-lint, 39 packages, and 8/8 live validation on gpt-4o-mini). Closing as redundant.

urmzd force-pushed the wave-1-correctness-floor branch from 525fb5a to 02d7b55 Compare May 31, 2026 20:50

urmzd force-pushed the wave-1-correctness-floor branch from 02d7b55 to a77e957 Compare May 31, 2026 21:09

urmzd mentioned this pull request May 31, 2026

Integrate waves 1-5 + live validation & benchmarks #44

Merged

urmzd closed this May 31, 2026

urmzd deleted the wave-1-correctness-floor branch May 31, 2026 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): channel-wrapping retry + token-usage metric (Wave 1)#41

feat(agent): channel-wrapping retry + token-usage metric (Wave 1)#41
urmzd wants to merge 1 commit into
mainfrom
wave-1-correctness-floor

urmzd commented May 31, 2026

Uh oh!

urmzd commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

urmzd commented May 31, 2026

Wave 1 — Correctness floor

What landed

Verification

Deferred (TODO)

Uh oh!

urmzd commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant