perf: agent-runtime hot-path hardening (behavior-preserving) by nyo16 · Pull Request #62 · nyo16/nous

nyo16 · 2026-06-19T20:13:06Z

Summary

Eliminates confirmed super-linear and serialization hot paths in the agent
runtime — core loop, persistence, team coordination, and memory search.
Every change is behavior-preserving (provider payloads, search results,
and claim semantics are byte/semantically identical) and was measured first
with Benchee before refactoring.

Full suite green: 1896 passed, 0 failed, stable across 3 seeds.
mix compile --warnings-as-errors, mix format --check-formatted, and
mix credo (372 files) all clean.

Changes

Core loop — tool-schema conversion cache

Tool→provider schema conversion is now memoized once per run via a
runtime-only Context.tool_schema_cache keyed on {provider, tool-name set},
re-converting only when the set changes. Anthropic conversion alone was
~12.6µs + ~90KB allocated every iteration on an otherwise-static tool set.
Stripped from Context.serialize/1 (derived, never persisted).
Gated out: the incremental message-marshalling cache — Phase-0 bench showed
message marshalling is <4µs even at 100 messages (≪1ms), not worth the
stale-history-to-LLM risk.

Persistence / OTP

agent_server: context save on the response and clear_history paths is
now fire-and-forget via Task.Supervisor (off the GenServer mailbox) so a
slow backend can't block the loop; the explicit :save_context call stays
synchronous. Ordering tradeoff documented in code.
teams/rate_limiter: running window counters replace the two per-acquire
O(n) folds — rate_limited?/2 is now O(1).
teams/shared_state: ETS row-per-entry for discoveries and claims
replaces a single growing list term that was copied on every :ets.insert;
claim conflict checks use a file-scoped matchspec; release/expire are O(1)
key deletes; dedup is automatic via the {:claim, agent, file} key.

Context updates

context_update_to_map + public ContextUpdate.apply/2: the O(n²)
existing ++ [item] appends are replaced with prepend + per-key reverse, with
reversal-tracking that preserves exact insertion order (including the
set [list] → append case).

Memory / search

memory/store/ets, knowledge_base/store/ets, decisions/store/ets:
push scope / kb_id / type+status filters into ETS via partial-map
matchspecs instead of tab2list-copying the whole table then filtering in
Elixir.
normalize_relevance (single reduce for the max, no intermediate list).
memory/store/sqlite: query L2 norm hoisted out of the cosine loop
(computed once, not per candidate).

Benchmarks (dev env, M4 Max)

Path	Before	After
ETS scoped search @ 10k entries	17.5 ms	6.47 ms (2.7×)
ETS scoped search @ 1k entries	1.22 ms	0.62 ms (~2×)
Tool conversion / iteration (Anthropic, ×20)	every iteration	once per run
Cosine 2k×768 (qnorm hoisted + stored norms, isolated)	69 ms	32 ms (2.16×)

New dev-only scripts: bench/marshalling_bench.exs, bench/memory_search_bench.exs.

Tests

New test/nous/tool/context_update_test.exs — 9 ordering cases + differential
check vs a reference ++ implementation.
agent_server: slow-backend non-blocking test + async-save awaits.
rate_limiter: window-counter invariant + prune-subtract tests.
shared_state: 2 concurrency tests (race for one region → exactly one wins;
non-overlapping concurrent claims all succeed).
agent_runner: tool-cache golden-master (cached iteration payload ==
uncached iteration payload).

Deferred (documented)

SQLite stored-norm column (precompute candidate L2 norms at insert):
needs an Exqlite schema migration + backfill; exqlite is an optional dep and
is the stub in this build, so it's unverifiable here. The query-norm hoist
(schema-free) is in.
SharedState :public/lock-free direct reads: get_discoveries/
get_claims have no non-test callers and aren't hot; direct reads would need
an API change. Kept the (pid) API + :private table.
Decisions edge-direction reads and KB/Decisions backlink/outlink secondary
index — not exercised by the baseline; graph sizes bounded.

Risk / rollback

Each phase is independent. The tool-schema cache is the only stateful addition
and falls back to per-iteration conversion if the field is absent. No public
API or schema changes.

Eliminate confirmed super-linear and serialization hot paths in the agent runtime, measured with Benchee first (bench/marshalling_bench.exs, bench/memory_search_bench.exs). All changes preserve observable behavior; full suite green (1896 passed, stable across seeds). Core loop - Tool-schema conversion memoized once per run via a runtime-only Context.tool_schema_cache keyed on {provider, tool-name set} (Anthropic conversion was ~12.6us + ~90KB allocated every iteration on a static set). Stripped from Context.serialize/1. - Phase-0 bench showed per-iteration message marshalling is <4us even at 100 msgs, so the message-marshalling cache was gated out (not worth the stale-history risk). Persistence / OTP - agent_server: context save on the response + clear_history paths is now fire-and-forget via Task.Supervisor (off the GenServer mailbox); the explicit :save_context call stays synchronous. - teams/rate_limiter: running window counters replace per-acquire O(n) folds; rate_limited?/2 is now O(1). - teams/shared_state: ETS row-per-entry (discoveries + claims) replaces a single growing list term copied on every insert; claim conflict checks use a file-scoped matchspec; release/expire are O(1) deletes. Context updates - context_update_to_map + ContextUpdate.apply: O(n^2) `++ [item]` appends replaced with prepend + per-key reverse (reversal-tracking preserves exact order, including set-list-then-append). Memory / search - memory/store/ets + knowledge_base + decisions stores: push scope/kb_id/type filters into ETS via partial-map matchspecs instead of tab2list-copying the whole table (scoped search ~2.7x faster at 10k entries). - memory/search: single-pass filter + folded normalize_relevance. - memory/store/sqlite: query L2 norm hoisted out of the cosine loop. Tests: +context_update_test (ordering), slow-backend non-blocking + async-save tests, rate-limiter window-invariant + prune, shared_state concurrency, tool-cache golden master.

nyo16 merged commit 35603b1 into master Jun 19, 2026
6 checks passed

nyo16 deleted the perf/agent-runtime-hardening branch June 19, 2026 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: agent-runtime hot-path hardening (behavior-preserving)#62

perf: agent-runtime hot-path hardening (behavior-preserving)#62
nyo16 merged 1 commit into
masterfrom
perf/agent-runtime-hardening

nyo16 commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyo16 commented Jun 19, 2026

Summary

Changes

Core loop — tool-schema conversion cache

Persistence / OTP

Context updates

Memory / search

Benchmarks (dev env, M4 Max)

Tests

Deferred (documented)

Risk / rollback

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant