Skip to content

feat(agent): per-call timeouts, parallel-tool concurrency cap, max-iteration signal (Wave 5)#42

Closed
urmzd wants to merge 1 commit into
mainfrom
wave-5-limits-timeouts
Closed

feat(agent): per-call timeouts, parallel-tool concurrency cap, max-iteration signal (Wave 5)#42
urmzd wants to merge 1 commit into
mainfrom
wave-5-limits-timeouts

Conversation

@urmzd

@urmzd urmzd commented May 31, 2026

Copy link
Copy Markdown
Owner

Wave 5 — GA hardening (bounded slice)

All changes are in the agent package and stay green: go build ./..., go vet ./agent/, and go test ./agent/ pass (race-clean on the affected tests).

What landed

(a) Per-call timeouts

  • AgentConfig.LLMTimeout / AgentConfig.ToolTimeout (time.Duration, 0 = none) + WithLLMTimeout / WithToolTimeout options.
  • A child context.WithTimeout wraps the provider call inside getAssistantMessage and each tool step inside executeOneTool.
  • A slow provider that honours the deadline surfaces a transient *types.ProviderError wrapping context.DeadlineExceeded (errors.Is(err, ErrProviderFailed) and IsTransient(err) both hold).
  • A slow tool surfaces a deadline-exceeded tool error — including the case where the tool ignores ctx and completes after the deadline (checked via stepCtx.Err() after execution).

(b) Parallel-tool concurrency cap

  • AgentConfig.MaxParallelTools (int, 0 = unlimited) + WithMaxParallelTools.
  • executeToolsConcurrently bounds the fan-out goroutines with a buffered-channel counting semaphore.
  • The durable-runner path (non-Noop StepRunner) still runs tools sequentially in the caller's goroutine — unchanged.

(c) Max-iteration truncation signal

  • types.ErrMaxIterations was defined but never emitted. runLoop now tracks pendingWork: when it breaks on the iteration cap while the last assistant turn still had pending tool calls, it emits types.ErrorDelta{Error: ErrMaxIterations} so consumers can distinguish "truncated" from "finished".
  • A clean natural finish (text-only turn or empty response) clears pendingWork and does NOT emit the error.

Tests (agent/limits_test.go, table-driven)

  • LLM timeout fires, is transient, wraps DeadlineExceeded; disabled-by-default path stays clean.
  • Tool timeout fires for a ctx-respecting tool and for a ctx-ignoring slow tool; fast tools unaffected.
  • Concurrency cap: max-observed concurrency never exceeds N (probe tool with max-observed assertion); unlimited (0) runs all in parallel; cap=1 serializes and still returns every result.
  • ErrMaxIterations emitted on a truncated run; NOT emitted on a clean finish nor when the assistant lands its final text turn exactly on the cap.

Deferred (TODO, beyond this slice)

  • EventStream backpressure policy.
  • Subagent depth limit + Logger/Metrics/StepRunner propagation into subagents.
  • NewAgent -> (*Agent, error) (currently panics on invalid handoff config).
  • Tracing coverage.
  • Semver policy.

Stacked on #38 (foundation) — merge after #38.

@urmzd urmzd force-pushed the wave-5-limits-timeouts branch from d7d8efe to 1c1018d Compare May 31, 2026 20:56
…eration signal

Add GA-hardening limits to the agent loop:

- LLMTimeout/ToolTimeout (+ WithLLMTimeout/WithToolTimeout): derive a child
  context.WithTimeout around the provider call in getAssistantMessage and
  around each tool step in executeOneTool. A slow provider surfaces a transient
  ProviderError; a slow tool surfaces a deadline-exceeded tool error (even if
  the tool ignores ctx and completes late). 0 = no timeout (default).
- MaxParallelTools (+ WithMaxParallelTools): bound the parallel-tool goroutines
  with a buffered-channel semaphore. 0 = unlimited. Durable-runner sequential
  path is unchanged.
- ErrMaxIterations signal: emit types.ErrorDelta{Error: ErrMaxIterations} when
  runLoop breaks on the iteration cap while the last assistant turn still had
  pending tool calls, so consumers can tell truncated from a clean finish. Not
  emitted on a natural text-only/empty finish.

Table-driven tests in agent/limits_test.go cover all three plus the disabled/
unlimited defaults. Existing tests unchanged.
@urmzd

urmzd commented May 31, 2026

Copy link
Copy Markdown
Owner Author

Superseded by #44, which was squash-merged into main (53a6aff) and contains every change from this branch — a dry merge of this branch into main is a no-op. main is green (build/vet/golangci-lint, 39 packages, and 8/8 live validation on gpt-4o-mini). Closing as redundant.

@urmzd urmzd closed this May 31, 2026
@urmzd urmzd deleted the wave-5-limits-timeouts branch May 31, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant