Skip to content

perf: agent-runtime hot-path hardening (behavior-preserving)#62

Merged
nyo16 merged 1 commit into
masterfrom
perf/agent-runtime-hardening
Jun 19, 2026
Merged

perf: agent-runtime hot-path hardening (behavior-preserving)#62
nyo16 merged 1 commit into
masterfrom
perf/agent-runtime-hardening

Conversation

@nyo16

@nyo16 nyo16 commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

Eliminates confirmed super-linear and serialization hot paths in the agent
runtime — core loop, persistence, team coordination, and memory search.
Every change is behavior-preserving (provider payloads, search results,
and claim semantics are byte/semantically identical) and was measured first
with Benchee before refactoring.

Full suite green: 1896 passed, 0 failed, stable across 3 seeds.
mix compile --warnings-as-errors, mix format --check-formatted, and
mix credo (372 files) all clean.

Changes

Core loop — tool-schema conversion cache

  • Tool→provider schema conversion is now memoized once per run via a
    runtime-only Context.tool_schema_cache keyed on {provider, tool-name set},
    re-converting only when the set changes. Anthropic conversion alone was
    ~12.6µs + ~90KB allocated every iteration on an otherwise-static tool set.
  • Stripped from Context.serialize/1 (derived, never persisted).
  • Gated out: the incremental message-marshalling cache — Phase-0 bench showed
    message marshalling is <4µs even at 100 messages (≪1ms), not worth the
    stale-history-to-LLM risk.

Persistence / OTP

  • agent_server: context save on the response and clear_history paths is
    now fire-and-forget via Task.Supervisor (off the GenServer mailbox) so a
    slow backend can't block the loop; the explicit :save_context call stays
    synchronous. Ordering tradeoff documented in code.
  • teams/rate_limiter: running window counters replace the two per-acquire
    O(n) folds — rate_limited?/2 is now O(1).
  • teams/shared_state: ETS row-per-entry for discoveries and claims
    replaces a single growing list term that was copied on every :ets.insert;
    claim conflict checks use a file-scoped matchspec; release/expire are O(1)
    key deletes; dedup is automatic via the {:claim, agent, file} key.

Context updates

  • context_update_to_map + public ContextUpdate.apply/2: the O(n²)
    existing ++ [item] appends are replaced with prepend + per-key reverse, with
    reversal-tracking that preserves exact insertion order (including the
    set [list]append case).

Memory / search

  • memory/store/ets, knowledge_base/store/ets, decisions/store/ets:
    push scope / kb_id / type+status filters into ETS via partial-map
    matchspecs instead of tab2list-copying the whole table then filtering in
    Elixir.
    normalize_relevance (single reduce for the max, no intermediate list).
  • memory/store/sqlite: query L2 norm hoisted out of the cosine loop
    (computed once, not per candidate).

Benchmarks (dev env, M4 Max)

Path Before After
ETS scoped search @ 10k entries 17.5 ms 6.47 ms (2.7×)
ETS scoped search @ 1k entries 1.22 ms 0.62 ms (~2×)
Tool conversion / iteration (Anthropic, ×20) every iteration once per run
Cosine 2k×768 (qnorm hoisted + stored norms, isolated) 69 ms 32 ms (2.16×)

New dev-only scripts: bench/marshalling_bench.exs, bench/memory_search_bench.exs.

Tests

  • New test/nous/tool/context_update_test.exs — 9 ordering cases + differential
    check vs a reference ++ implementation.
  • agent_server: slow-backend non-blocking test + async-save awaits.
  • rate_limiter: window-counter invariant + prune-subtract tests.
  • shared_state: 2 concurrency tests (race for one region → exactly one wins;
    non-overlapping concurrent claims all succeed).
  • agent_runner: tool-cache golden-master (cached iteration payload ==
    uncached iteration payload).

Deferred (documented)

  • SQLite stored-norm column (precompute candidate L2 norms at insert):
    needs an Exqlite schema migration + backfill; exqlite is an optional dep and
    is the stub in this build, so it's unverifiable here. The query-norm hoist
    (schema-free) is in.
  • SharedState :public/lock-free direct reads: get_discoveries/
    get_claims have no non-test callers and aren't hot; direct reads would need
    an API change. Kept the (pid) API + :private table.
  • Decisions edge-direction reads and KB/Decisions backlink/outlink secondary
    index — not exercised by the baseline; graph sizes bounded.

Risk / rollback

Each phase is independent. The tool-schema cache is the only stateful addition
and falls back to per-iteration conversion if the field is absent. No public
API or schema changes.

Eliminate confirmed super-linear and serialization hot paths in the agent
runtime, measured with Benchee first (bench/marshalling_bench.exs,
bench/memory_search_bench.exs). All changes preserve observable behavior;
full suite green (1896 passed, stable across seeds).

Core loop
- Tool-schema conversion memoized once per run via a runtime-only
  Context.tool_schema_cache keyed on {provider, tool-name set} (Anthropic
  conversion was ~12.6us + ~90KB allocated every iteration on a static set).
  Stripped from Context.serialize/1.
- Phase-0 bench showed per-iteration message marshalling is <4us even at 100
  msgs, so the message-marshalling cache was gated out (not worth the
  stale-history risk).

Persistence / OTP
- agent_server: context save on the response + clear_history paths is now
  fire-and-forget via Task.Supervisor (off the GenServer mailbox); the
  explicit :save_context call stays synchronous.
- teams/rate_limiter: running window counters replace per-acquire O(n) folds;
  rate_limited?/2 is now O(1).
- teams/shared_state: ETS row-per-entry (discoveries + claims) replaces a
  single growing list term copied on every insert; claim conflict checks use
  a file-scoped matchspec; release/expire are O(1) deletes.

Context updates
- context_update_to_map + ContextUpdate.apply: O(n^2) `++ [item]` appends
  replaced with prepend + per-key reverse (reversal-tracking preserves exact
  order, including set-list-then-append).

Memory / search
- memory/store/ets + knowledge_base + decisions stores: push scope/kb_id/type
  filters into ETS via partial-map matchspecs instead of tab2list-copying the
  whole table (scoped search ~2.7x faster at 10k entries).
- memory/search: single-pass filter + folded normalize_relevance.
- memory/store/sqlite: query L2 norm hoisted out of the cosine loop.

Tests: +context_update_test (ordering), slow-backend non-blocking +
async-save tests, rate-limiter window-invariant + prune, shared_state
concurrency, tool-cache golden master.
@nyo16 nyo16 merged commit 35603b1 into master Jun 19, 2026
6 checks passed
@nyo16 nyo16 deleted the perf/agent-runtime-hardening branch June 19, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant