Integrate waves 1-5 + live validation & benchmarks#44
Merged
Conversation
… KG/RAG tools Wave 3 isolation & security hardening. (a) Path-traversal confinement in tools/research: - Add resolveWithinRoot helper (safepath.go): resolves the requested path to an absolute path under a configured root and rejects ../ traversal, absolute paths outside root, and symlinks whose real target escapes root. - read_file and file_search now confine all reads to the root. - Table-driven tests cover ../../etc/passwd, absolute escapes, symlink escape. (b) HITL-gate mutating KG/RAG agent tools: - rag/tool, knowledge/tool, tools/research NewTools now wrap mutating tools (rag_update, rag_delete, kg_ingest, store_knowledge) in a human_approval Marker by default so the agent loop pauses for approval. - Add ReadOnly() functional option to omit mutating tools entirely. - Tests assert mutating tools carry the marker and are absent in read-only mode. Backward compatible: NewTools signatures gain only variadic options; tool name ordering preserved.
…ation RAG: add WithRecency(halfLife, weight) SearchOption that blends an exponential time-decay factor exp(-ln2*age/halfLife) into each hit's fused score after RRF. Opt-in (non-positive half-life is a no-op). SearchHit gains a Timestamp populated from the document UpdatedAt (fallback CreatedAt) in memstore and pgstore. KG: wire the previously-unused Store.InvalidateRelation into the engine ingestion path. A new relation for an existing (source,target,type) rule-based supersedes active prior relation(s) of the same type, setting their InvalidAt to the new relation's ValidAt.
Wave 1 correctness floor. (a) retry.Provider now retries when the stream emits an ErrorDelta BEFORE any content delta, not just when ChatStream returns a synchronous error. Streaming adapters surface transient failures (529 overload, mid-handshake timeouts) as a channel-delivered ErrorDelta; the decorator buffers leading metadata deltas, classifies the error via the existing transient/ShouldRetry path, and re-invokes with backoff. Once content has streamed, the error is surfaced (never retry a partially consumed turn). (b) The agent loop now calls Metrics.RecordTokenUsage once per completed LLM call with the merged prompt/completion tokens (skipped on cache hit or when no usage was reported). agent/otel collapses the three duplicate gen_ai.client.operation.duration histograms into one instrument keyed by gen_ai.operation.name.
Wire AGENT TREE PERSISTENCE behind the existing types.Store seam, testable without Postgres. - Add AgentConfig.Store + WithStore option (default nil = today's in-memory-only behavior, fully backward compatible). - runLoop persists each new node (and branch tip) to the Store as it is added, via Store.Tx so the tip never points at an unsaved node. Best-effort: errors are logged, never fatal. - NewAgent persists the root node + main branch up front when a Store is configured, giving LoadTreeFromStore an anchor before the first Invoke. - Add LoadTreeFromStore helper (Store.LoadTree + tree.FromStore) for the read/resume path. - New package agent/store/memstore: in-memory types.Store implementing the full interface (SaveNode/LoadNode/LoadChildren/LoadPath/SaveBranch/ LoadBranch/ListBranches/SaveCheckpoint/LoadCheckpoint/LoadTree/Tx) with atomic buffered transactions. Tests: memstore unit tests (round-trip, children order, path, branches, checkpoints, reachable-subtree LoadTree, Tx commit/rollback); agent multi-turn Invoke -> reconstruct tree from memstore -> assert full message history round-trips; backward-compat (nil Store) and root-on-construction.
…eration signal
Add GA-hardening limits to the agent loop:
- LLMTimeout/ToolTimeout (+ WithLLMTimeout/WithToolTimeout): derive a child
context.WithTimeout around the provider call in getAssistantMessage and
around each tool step in executeOneTool. A slow provider surfaces a transient
ProviderError; a slow tool surfaces a deadline-exceeded tool error (even if
the tool ignores ctx and completes late). 0 = no timeout (default).
- MaxParallelTools (+ WithMaxParallelTools): bound the parallel-tool goroutines
with a buffered-channel semaphore. 0 = unlimited. Durable-runner sequential
path is unchanged.
- ErrMaxIterations signal: emit types.ErrorDelta{Error: ErrMaxIterations} when
runLoop breaks on the iteration cap while the last assistant turn still had
pending tool calls, so consumers can tell truncated from a clean finish. Not
emitted on a natural text-only/empty finish.
Table-driven tests in agent/limits_test.go cover all three plus the disabled/
unlimited defaults. Existing tests unchanged.
Add examples/validation: a runnable harness that exercises the agent SDK's features against a real model (gpt-4o-mini) — basic generation, tool calling, response caching (CacheHit), token metrics, agent handoff, durable memoization, LLM timeout, and multimodal tool output. Skips cleanly without OPENAI_API_KEY. Committed sample outputs under examples/validation/results/ (report + bench numbers) so users can see real runs. Adds mock-based benchmarks in agent/bench_test.go and agent/provider/cache/bench_test.go, plus `just validate` and `just bench-report` targets.
This was referenced May 31, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Combines the five roadmap waves on top of
main(foundation #38 already merged), with the inter-waveagent.goconflicts resolved, plus a live validation harness and benchmarks.What's here
Five waves cherry-picked and reconciled (all compose cleanly —
go build,go vet,golangci-lint, andgo test ./...(39 pkgs) all green):Storeseam + in-memorymemstoreread_file+ HITL-gated mutating KG/RAG toolsErrMaxIterationssignalThe only real conflicts were in
getAssistantMessageandAgentConfig(wave 1's token metric vs. wave 5's LLM-timeout); resolved so both run (timeout check first, then token recording).Live validation (gpt-4o-mini)
examples/validationexercises the features against a real model and writes a report. Last run: 8/8 passing.Run:
OPENAI_API_KEY=... just validate. Sample outputs are committed underexamples/validation/results/.Benchmarks (mock-based, deterministic)
agent/bench_test.go+agent/provider/cache/bench_test.go— agent loop ~4.9µs/op, durable-noop overhead ~negligible, cache hit ~0.5µs. Regenerate withjust bench-report.Relationship to the wave PRs
Each wave is also a standalone, now-mergeable PR (#41 #39 #40 #43 #42), all rebased onto
mainand green. Merge those individually or merge this integration branch — either way the conflicts are pre-resolved here.