perf: agent-runtime hot-path hardening (behavior-preserving)#62
Merged
Conversation
Eliminate confirmed super-linear and serialization hot paths in the agent
runtime, measured with Benchee first (bench/marshalling_bench.exs,
bench/memory_search_bench.exs). All changes preserve observable behavior;
full suite green (1896 passed, stable across seeds).
Core loop
- Tool-schema conversion memoized once per run via a runtime-only
Context.tool_schema_cache keyed on {provider, tool-name set} (Anthropic
conversion was ~12.6us + ~90KB allocated every iteration on a static set).
Stripped from Context.serialize/1.
- Phase-0 bench showed per-iteration message marshalling is <4us even at 100
msgs, so the message-marshalling cache was gated out (not worth the
stale-history risk).
Persistence / OTP
- agent_server: context save on the response + clear_history paths is now
fire-and-forget via Task.Supervisor (off the GenServer mailbox); the
explicit :save_context call stays synchronous.
- teams/rate_limiter: running window counters replace per-acquire O(n) folds;
rate_limited?/2 is now O(1).
- teams/shared_state: ETS row-per-entry (discoveries + claims) replaces a
single growing list term copied on every insert; claim conflict checks use
a file-scoped matchspec; release/expire are O(1) deletes.
Context updates
- context_update_to_map + ContextUpdate.apply: O(n^2) `++ [item]` appends
replaced with prepend + per-key reverse (reversal-tracking preserves exact
order, including set-list-then-append).
Memory / search
- memory/store/ets + knowledge_base + decisions stores: push scope/kb_id/type
filters into ETS via partial-map matchspecs instead of tab2list-copying the
whole table (scoped search ~2.7x faster at 10k entries).
- memory/search: single-pass filter + folded normalize_relevance.
- memory/store/sqlite: query L2 norm hoisted out of the cosine loop.
Tests: +context_update_test (ordering), slow-backend non-blocking +
async-save tests, rate-limiter window-invariant + prune, shared_state
concurrency, tool-cache golden master.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eliminates confirmed super-linear and serialization hot paths in the agent
runtime — core loop, persistence, team coordination, and memory search.
Every change is behavior-preserving (provider payloads, search results,
and claim semantics are byte/semantically identical) and was measured first
with Benchee before refactoring.
Full suite green: 1896 passed, 0 failed, stable across 3 seeds.
mix compile --warnings-as-errors,mix format --check-formatted, andmix credo(372 files) all clean.Changes
Core loop — tool-schema conversion cache
runtime-only
Context.tool_schema_cachekeyed on{provider, tool-name set},re-converting only when the set changes. Anthropic conversion alone was
~12.6µs + ~90KB allocated every iteration on an otherwise-static tool set.
Context.serialize/1(derived, never persisted).message marshalling is
<4µseven at 100 messages (≪1ms), not worth thestale-history-to-LLM risk.
Persistence / OTP
agent_server: context save on the response andclear_historypaths isnow fire-and-forget via
Task.Supervisor(off the GenServer mailbox) so aslow backend can't block the loop; the explicit
:save_contextcall stayssynchronous. Ordering tradeoff documented in code.
teams/rate_limiter: running window counters replace the two per-acquireO(n) folds —
rate_limited?/2is now O(1).teams/shared_state: ETS row-per-entry for discoveries and claimsreplaces a single growing list term that was copied on every
:ets.insert;claim conflict checks use a file-scoped matchspec; release/expire are O(1)
key deletes; dedup is automatic via the
{:claim, agent, file}key.Context updates
context_update_to_map+ publicContextUpdate.apply/2: the O(n²)existing ++ [item]appends are replaced with prepend + per-key reverse, withreversal-tracking that preserves exact insertion order (including the
set [list]→appendcase).Memory / search
memory/store/ets,knowledge_base/store/ets,decisions/store/ets:push
scope/kb_id/type+statusfilters into ETS via partial-mapmatchspecs instead of
tab2list-copying the whole table then filtering inElixir.
normalize_relevance(single reduce for the max, no intermediate list).memory/store/sqlite: query L2 norm hoisted out of the cosine loop(computed once, not per candidate).
Benchmarks (dev env, M4 Max)
New dev-only scripts:
bench/marshalling_bench.exs,bench/memory_search_bench.exs.Tests
test/nous/tool/context_update_test.exs— 9 ordering cases + differentialcheck vs a reference
++implementation.agent_server: slow-backend non-blocking test + async-save awaits.rate_limiter: window-counter invariant + prune-subtract tests.shared_state: 2 concurrency tests (race for one region → exactly one wins;non-overlapping concurrent claims all succeed).
agent_runner: tool-cache golden-master (cached iteration payload ==uncached iteration payload).
Deferred (documented)
needs an Exqlite schema migration + backfill;
exqliteis an optional dep andis the stub in this build, so it's unverifiable here. The query-norm hoist
(schema-free) is in.
SharedState:public/lock-free direct reads:get_discoveries/get_claimshave no non-test callers and aren't hot; direct reads would needan API change. Kept the
(pid)API +:privatetable.index — not exercised by the baseline; graph sizes bounded.
Risk / rollback
Each phase is independent. The tool-schema cache is the only stateful addition
and falls back to per-iteration conversion if the field is absent. No public
API or schema changes.