Skip to content

Audit-driven fixes: provider marshalling, ETS lifecycle, OTP hygiene, telemetry#58

Merged
nyo16 merged 11 commits into
masterfrom
worktree-audit-fixes
May 17, 2026
Merged

Audit-driven fixes: provider marshalling, ETS lifecycle, OTP hygiene, telemetry#58
nyo16 merged 11 commits into
masterfrom
worktree-audit-fixes

Conversation

@nyo16

@nyo16 nyo16 commented May 17, 2026

Copy link
Copy Markdown
Owner

Summary

Comprehensive bug-fix pass driven by a parallel multi-agent code audit (dead code / leaks / broken code / OTP concurrency lenses). Eleven logically-grouped commits, all
individually verified with TDD where applicable.

1754 tests passing (was 1737 — 17 new regression tests). mix compile --warnings-as-errors is clean in both dev and test envs.

# Commit Theme
1 1c1253c Gemini tool name, OpenAI bad-JSON, Anthropic/Gemini usage, streaming tool-call id
2 3a8cc9e Workflow checkpoint TableOwner, Memory plugin re-init, scratch cleanup test
3 d75dd6b Migrate Task.asyncTask.Supervisor.async_nolink; clear compile warnings
4 ea99c5d AgentServer PubSub topic mismatch + terminate cancellation
5 993af69 Hook fail_closed: opt-in + tool validator enum/type composition
6 3fba62e LLM.stream_text_with_tools emits {:error, _} instead of silent halt
7 4410fd5 Telemetry: emit documented events, document existing ones
8 37002c3 Bumblebee serving holder, AgentRegistry partitions, async context load
9 23fca73 Req streaming backend backpressure guard (M-12)
10 65b7648 @deprecated on orphan public API + SQLite FTS5 escape
11 9baa035 CHANGELOG

See CHANGELOG.md "Unreleased" for the full per-fix breakdown.

Critical bugs fixed

  • Every Gemini/Vertex tool roundtrip was malformedfunctionResponse.name carried the tool_call_id instead of the function name. Fixed by threading :name
    through Message.tool/3.
  • OpenAI tool-call malformed JSON was passed to tools as bogus args — now surfaced as a clean tool-error result the LLM can retry.
  • Workflow checkpoints vanished when the saving process exited — ETS table now owned by a supervised TableOwner.
  • Memory plugin re-initialized its ETS store every agent run — losing all memories across runs and leaking tables. Now reuses existing store_state.
  • AgentServer subscribed to wrong PubSub topic — anyone publishing via Nous.PubSub.agent_topic/1 never reached it.
  • AgentServer didn't cancel in-flight LLM streams on shutdown — streams kept consuming tokens after the server was gone.

Behavior-changing decisions

  • Stream backend default: added backpressure guard to Req (chosen over flipping the default to Hackney).
  • Public API: kept four orphaned functions, marked @deprecated (no removal in 0.x).
  • Telemetry: emit the missing events AND document the existing ones (reconcile both ways).

Test plan

  • mix test — 1754 / 1754 passing (3 doctests + 1751 tests, 101 excluded), ~4.2s
  • mix compile --warnings-as-errors — clean in dev env
  • MIX_ENV=test mix compile --warnings-as-errors — clean in test env
  • 17 new regression tests written via TDD (Gemini name, OpenAI bad-JSON, checkpoint survival, memory re-init, Gemini stream id, validator enum, hook fail_closed,
    scratch cleanup, llm error emit)
  • Manual smoke test against a real Gemini API (the functionResponse.name fix is the highest-impact item)
  • Dialyzer (not run — no cached PLT; would take 10+ min to rebuild)

nyo16 added 11 commits May 16, 2026 21:12
…, usage parsing

- Gemini functionResponse.name must equal the original functionCall.name;
  was leaking tool_call_id (e.g. "gemini_abc123") and breaking every
  Gemini/Vertex tool roundtrip. Now uses Message.name with tool_call_id
  fallback; agent_runner + llm + context all thread name through
  Message.tool/3.
- OpenAI tool-call arguments JSON-decode failure no longer injects a fake
  args map (%{"error" => ..., "raw" => ...}) into the tool. decode_arguments/1
  now returns {:ok, map()} | {:error, {:invalid_json, raw}}; parsers tag the
  call with "_invalid_arguments" and agent_runner short-circuits with a
  proper tool-error result so the LLM can retry.
- Streaming Gemini tool calls now synthesize a "gemini_<base64>" id matching
  the non-streaming parser instead of always emitting id: nil.
- Anthropic + Gemini usage parsing set requests: 1 (was 0) so final usage
  metrics aren't undercounted. Anthropic captures cache_creation_input_tokens
  and cache_read_input_tokens; Gemini captures cachedContentTokenCount. New
  Usage struct fields propagate through add/2.
- Comment why two Gemini parsers exist (parse_content vs parse_parts) to
  prevent future "consolidate these" refactors that would break callers.

1743 tests passing (was 1737, +6 new tests).
…n re-init

- Nous.Workflow.Checkpoint.ETS gains a supervised TableOwner GenServer
  started under Nous.Application. Previously the :nous_workflow_checkpoints
  table was owned by whichever caller first invoked save/load — when that
  process died, the table died and init/0 silently recreated an empty one,
  losing every suspended workflow that depended on resume. Added a regression
  test that saves from a transient Task and asserts the data survives.
- Nous.Plugins.Memory.init/2 is called on every agent run by
  AgentRunner.run_init. Previously it called store_mod.init/1 unconditionally,
  creating a fresh ETS table per run and silently discarding the prior one
  (under load: ets_too_many_tables). Now reuses store_state when already set;
  extracted apply_defaults/1 so the per-run defaults still get refreshed.
  Added a regression test asserting the second init returns the same store_state.
- Added a regression test that runs a workflow with scratch: true where a
  node raises, and asserts the :nous_scratch_* table is cleaned up. The
  executor's existing exception catch already handles this, but the test
  pins the contract.

1746 tests passing (was 1743, +3 new tests).
Migrate bare Task.async / Task.async_stream callsites to the application's
Nous.TaskSupervisor with the *_nolink variants — so a research/eval/scrape
crash no longer crashes the calling process (and vice versa), and the
supervisor can send graceful exits on app shutdown.

- research/coordinator.ex: top-level research task + parallel search stream
- eval/runner.ex: parallel suite run + per-test-case timeout task
- tools/search_scrape.ex: URL fan-out
- plugins/input_guard.ex: parallel strategy run
- http/stream_backend/req.ex: SSE producer task (the default streaming
  backend)

Task.yield now handles {:exit, reason} as {:error, {:task_exit, reason}}
instead of crashing the caller with CaseClauseError (coordinator.ex,
eval/runner.ex).

Compile warnings cleared:
- providers/vertex_ai.ex: drop unreachable validate_project_id(nil)
  clause (caller already guards with `if project`).
- research/planner.ex: tighten @SPEC to {:ok, plan()}; the LLM-error
  branch falls back to a single-step plan and never returns {:error, _}.
- research/coordinator.ex: drop dead {:error, _} clause in research_loop/1
  now that plan_phase/1 only returns {:ok, _, _}.

mix compile --warnings-as-errors is clean in both dev and test envs.
1746 tests passing.
- AgentServer.init/1 subscribed to "agent:#{session_id}" while the public
  helper Nous.PubSub.agent_topic/1 returns "nous:agent:#{session_id}".
  Anyone publishing via the helper never reached the server. Use the
  helper.
- AgentServer.terminate/2 now cancels state.current_task on shutdown
  (set the cancelled atomic + Task.shutdown). Previously the in-flight
  LLM stream would keep consuming tokens and HTTP connections after the
  server was already gone, until the runner hit max_iterations.
- Replace `_ -> :ok` swallow on Task.shutdown with explicit clauses;
  log {:exit, reason} crashes during reset instead of silently
  discarding them.
…tor constraint composition

- Nous.Hook gains a fail_closed: boolean() field. When set on a hook bound
  to a blocking event (:pre_tool_use, :pre_request), runtime errors (raised
  exceptions, timeouts, non-0/2 command exit codes) now :deny instead of
  silently failing open. Default remains false for backward compatibility;
  set fail_closed: true on hooks that gate security-sensitive operations.
- Nous.Tool.Validator.validate_types/2 previously matched the property
  schema's "type" clause before "enum", silently dropping any declared
  enum constraint when both keys were present (e.g. %{"type" => "string",
  "enum" => ["a","b"]} accepted any string). Now every constraint runs
  via maybe_check_type/4 + maybe_check_enum/4 so a value that violates
  both produces two errors and a value that violates only one still
  fails.

1753 tests passing (was 1746, +7 new tests).
Nous.LLM.stream_text_with_tools/6 silently :halt'd its Stream.resource when
Fallback.with_fallback returned {:error, _}. Consumers iterating the stream
saw a clean empty stream with no signal that the LLM call had failed,
making error handling impossible for the streaming + tools path.

Now emits an {:error, reason} event before halting, matching the contract
consumers expect from any Stream-based provider. Regression test added.
…ve stale doc

Documented but never emitted (now emitted):
- [:nous, :agent, :iteration, :start/:stop] — fires around each do_iteration
  in AgentRunner with iteration, max_iterations, tool_calls, needs_response.
- [:nous, :context, :update] — fires from Tool.ContextUpdate.apply/2 with
  keys_updated count and the list of keys that changed.
- [:nous, :callback, :execute] — fires from Nous.Agent.Callbacks.execute/3
  with the callback_type and agent_name.

Emitted but undocumented (now documented under their own sections):
- [:nous, :agent, :fallback, :used], [:nous, :fallback, :activated]
- [:nous, :hook, :execute, :start/:stop], [:nous, :hook, :denied]
- [:nous, :skill, :activate/:deactivate]
- [:nous, :workflow, :run, :*], [:nous, :workflow, :node, :*]

Dropped [:nous, :provider, :stream, :chunk] from docs and attach_default_handler
— it was never emitted, and a per-chunk telemetry call would be high-overhead
on the hot streaming path. Use [:nous, :provider, :stream, :start]/:connected/
:exception for stream lifecycle.

1754 tests passing.
…context load

- memory/embedding/bumblebee.ex: stop serializing every embedding through
  the one ServingHolder GenServer. handle_call/3 no longer runs
  Nx.Serving.run/2; it returns the serving struct via :get_serving, and
  callers run inference themselves. Nx.Serving is designed to batch
  concurrent calls, so this restores the parallelism the prior design
  bottlenecked.
- agent_registry.ex: bump Registry partitions to System.schedulers_online()
  (was the default :1). High-concurrency LiveView fan-ins called
  lookup/1 from many sockets and serialized on a single partition.
- agent_server.ex: defer maybe_load_context to a handle_continue so init/1
  returns immediately. Persistence I/O no longer blocks
  DynamicSupervisor.start_child — which means Teams.Coordinator's
  spawn_agent handle_call no longer wedges the team coordinator for
  seconds when the persistence backend is slow (S3, Postgres).

1754 tests passing, clean compile.
Req's :into callback pushed chunks to the consumer via send/2, so a fast
LLM + slow consumer (LiveView fan-out, persistence-per-chunk, slow IO)
grew the consumer's mailbox without bound — the M-12 risk called out in
mix.exs.

The producing Task now polls the consumer's message_queue_len before each
send:
- below @backpressure_high_water (1_000): forward chunk normally
- above: busy-wait in 5ms increments until queue drops below
  @backpressure_low_water (100), then resume
- still backed up after @backpressure_max_wait_ms (30s): surface
  {:error, %{reason: :backpressure_overflow, queue_len: n}} and halt
  rather than wedging forever

This pauses Req's :into callback while we wait, which transitively pauses
the producing socket — natural pull-based backpressure on top of the
push-based default. Users with reliably slow consumers can still opt
into Hackney's strict :async-once mode.

1754 tests passing.
- Mark @deprecated on public functions with no internal callers, kept for
  backward compat per user direction (not removed since this is the 0.x
  hex line and downstream consumers may rely on them):
    Nous.ToolSchema.to_openai/1 — use Nous.Tool.to_openai_schema/1
    Nous.Agent.tool/3 — use Agent.new/2 with :tools or build %Tool{} directly
    Nous.Eval.run!/2 — match Nous.Eval.run/2's {:ok, _} | {:error, _}
    Nous.Decisions.path_between/4, descendants/3, ancestors/3 — call
      store_mod.query directly
- memory/store/sqlite.ex: FTS5 query escaping now doubles embedded `"`
  per FTS5 inside-quotes rules. A search term like `say "hi"` previously
  produced invalid FTS5 syntax and errored.

1754 tests passing, --warnings-as-errors clean.
@nyo16 nyo16 merged commit 46f26bf into master May 17, 2026
6 checks passed
@nyo16 nyo16 deleted the worktree-audit-fixes branch May 17, 2026 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant