Audit-driven fixes: provider marshalling, ETS lifecycle, OTP hygiene, telemetry by nyo16 · Pull Request #58 · nyo16/nous

nyo16 · 2026-05-17T01:45:40Z

Summary

Comprehensive bug-fix pass driven by a parallel multi-agent code audit (dead code / leaks / broken code / OTP concurrency lenses). Eleven logically-grouped commits, all
individually verified with TDD where applicable.

1754 tests passing (was 1737 — 17 new regression tests). mix compile --warnings-as-errors is clean in both dev and test envs.

#	Commit	Theme
1	`1c1253c`	Gemini tool name, OpenAI bad-JSON, Anthropic/Gemini usage, streaming tool-call id
2	`3a8cc9e`	Workflow checkpoint TableOwner, Memory plugin re-init, scratch cleanup test
3	`d75dd6b`	Migrate `Task.async` → `Task.Supervisor.async_nolink`; clear compile warnings
4	`ea99c5d`	AgentServer PubSub topic mismatch + terminate cancellation
5	`993af69`	Hook `fail_closed:` opt-in + tool validator enum/type composition
6	`3fba62e`	`LLM.stream_text_with_tools` emits `{:error, _}` instead of silent halt
7	`4410fd5`	Telemetry: emit documented events, document existing ones
8	`37002c3`	Bumblebee serving holder, AgentRegistry partitions, async context load
9	`23fca73`	Req streaming backend backpressure guard (M-12)
10	`65b7648`	`@deprecated` on orphan public API + SQLite FTS5 escape
11	`9baa035`	CHANGELOG

See CHANGELOG.md "Unreleased" for the full per-fix breakdown.

Critical bugs fixed

Every Gemini/Vertex tool roundtrip was malformed — functionResponse.name carried the tool_call_id instead of the function name. Fixed by threading :name
through Message.tool/3.
OpenAI tool-call malformed JSON was passed to tools as bogus args — now surfaced as a clean tool-error result the LLM can retry.
Workflow checkpoints vanished when the saving process exited — ETS table now owned by a supervised TableOwner.
Memory plugin re-initialized its ETS store every agent run — losing all memories across runs and leaking tables. Now reuses existing store_state.
AgentServer subscribed to wrong PubSub topic — anyone publishing via Nous.PubSub.agent_topic/1 never reached it.
AgentServer didn't cancel in-flight LLM streams on shutdown — streams kept consuming tokens after the server was gone.

Behavior-changing decisions

Stream backend default: added backpressure guard to Req (chosen over flipping the default to Hackney).
Public API: kept four orphaned functions, marked @deprecated (no removal in 0.x).
Telemetry: emit the missing events AND document the existing ones (reconcile both ways).

Test plan

mix test — 1754 / 1754 passing (3 doctests + 1751 tests, 101 excluded), ~4.2s
mix compile --warnings-as-errors — clean in dev env
MIX_ENV=test mix compile --warnings-as-errors — clean in test env
17 new regression tests written via TDD (Gemini name, OpenAI bad-JSON, checkpoint survival, memory re-init, Gemini stream id, validator enum, hook fail_closed,
scratch cleanup, llm error emit)
Manual smoke test against a real Gemini API (the functionResponse.name fix is the highest-impact item)
Dialyzer (not run — no cached PLT; would take 10+ min to rebuild)

…, usage parsing - Gemini functionResponse.name must equal the original functionCall.name; was leaking tool_call_id (e.g. "gemini_abc123") and breaking every Gemini/Vertex tool roundtrip. Now uses Message.name with tool_call_id fallback; agent_runner + llm + context all thread name through Message.tool/3. - OpenAI tool-call arguments JSON-decode failure no longer injects a fake args map (%{"error" => ..., "raw" => ...}) into the tool. decode_arguments/1 now returns {:ok, map()} | {:error, {:invalid_json, raw}}; parsers tag the call with "_invalid_arguments" and agent_runner short-circuits with a proper tool-error result so the LLM can retry. - Streaming Gemini tool calls now synthesize a "gemini_<base64>" id matching the non-streaming parser instead of always emitting id: nil. - Anthropic + Gemini usage parsing set requests: 1 (was 0) so final usage metrics aren't undercounted. Anthropic captures cache_creation_input_tokens and cache_read_input_tokens; Gemini captures cachedContentTokenCount. New Usage struct fields propagate through add/2. - Comment why two Gemini parsers exist (parse_content vs parse_parts) to prevent future "consolidate these" refactors that would break callers. 1743 tests passing (was 1737, +6 new tests).

…n re-init - Nous.Workflow.Checkpoint.ETS gains a supervised TableOwner GenServer started under Nous.Application. Previously the :nous_workflow_checkpoints table was owned by whichever caller first invoked save/load — when that process died, the table died and init/0 silently recreated an empty one, losing every suspended workflow that depended on resume. Added a regression test that saves from a transient Task and asserts the data survives. - Nous.Plugins.Memory.init/2 is called on every agent run by AgentRunner.run_init. Previously it called store_mod.init/1 unconditionally, creating a fresh ETS table per run and silently discarding the prior one (under load: ets_too_many_tables). Now reuses store_state when already set; extracted apply_defaults/1 so the per-run defaults still get refreshed. Added a regression test asserting the second init returns the same store_state. - Added a regression test that runs a workflow with scratch: true where a node raises, and asserts the :nous_scratch_* table is cleaned up. The executor's existing exception catch already handles this, but the test pins the contract. 1746 tests passing (was 1743, +3 new tests).

Migrate bare Task.async / Task.async_stream callsites to the application's Nous.TaskSupervisor with the *_nolink variants — so a research/eval/scrape crash no longer crashes the calling process (and vice versa), and the supervisor can send graceful exits on app shutdown. - research/coordinator.ex: top-level research task + parallel search stream - eval/runner.ex: parallel suite run + per-test-case timeout task - tools/search_scrape.ex: URL fan-out - plugins/input_guard.ex: parallel strategy run - http/stream_backend/req.ex: SSE producer task (the default streaming backend) Task.yield now handles {:exit, reason} as {:error, {:task_exit, reason}} instead of crashing the caller with CaseClauseError (coordinator.ex, eval/runner.ex). Compile warnings cleared: - providers/vertex_ai.ex: drop unreachable validate_project_id(nil) clause (caller already guards with `if project`). - research/planner.ex: tighten @SPEC to {:ok, plan()}; the LLM-error branch falls back to a single-step plan and never returns {:error, _}. - research/coordinator.ex: drop dead {:error, _} clause in research_loop/1 now that plan_phase/1 only returns {:ok, _, _}. mix compile --warnings-as-errors is clean in both dev and test envs. 1746 tests passing.

- AgentServer.init/1 subscribed to "agent:#{session_id}" while the public helper Nous.PubSub.agent_topic/1 returns "nous:agent:#{session_id}". Anyone publishing via the helper never reached the server. Use the helper. - AgentServer.terminate/2 now cancels state.current_task on shutdown (set the cancelled atomic + Task.shutdown). Previously the in-flight LLM stream would keep consuming tokens and HTTP connections after the server was already gone, until the runner hit max_iterations. - Replace `_ -> :ok` swallow on Task.shutdown with explicit clauses; log {:exit, reason} crashes during reset instead of silently discarding them.

…tor constraint composition - Nous.Hook gains a fail_closed: boolean() field. When set on a hook bound to a blocking event (:pre_tool_use, :pre_request), runtime errors (raised exceptions, timeouts, non-0/2 command exit codes) now :deny instead of silently failing open. Default remains false for backward compatibility; set fail_closed: true on hooks that gate security-sensitive operations. - Nous.Tool.Validator.validate_types/2 previously matched the property schema's "type" clause before "enum", silently dropping any declared enum constraint when both keys were present (e.g. %{"type" => "string", "enum" => ["a","b"]} accepted any string). Now every constraint runs via maybe_check_type/4 + maybe_check_enum/4 so a value that violates both produces two errors and a value that violates only one still fails. 1753 tests passing (was 1746, +7 new tests).

Nous.LLM.stream_text_with_tools/6 silently :halt'd its Stream.resource when Fallback.with_fallback returned {:error, _}. Consumers iterating the stream saw a clean empty stream with no signal that the LLM call had failed, making error handling impossible for the streaming + tools path. Now emits an {:error, reason} event before halting, matching the contract consumers expect from any Stream-based provider. Regression test added.

…ve stale doc Documented but never emitted (now emitted): - [:nous, :agent, :iteration, :start/:stop] — fires around each do_iteration in AgentRunner with iteration, max_iterations, tool_calls, needs_response. - [:nous, :context, :update] — fires from Tool.ContextUpdate.apply/2 with keys_updated count and the list of keys that changed. - [:nous, :callback, :execute] — fires from Nous.Agent.Callbacks.execute/3 with the callback_type and agent_name. Emitted but undocumented (now documented under their own sections): - [:nous, :agent, :fallback, :used], [:nous, :fallback, :activated] - [:nous, :hook, :execute, :start/:stop], [:nous, :hook, :denied] - [:nous, :skill, :activate/:deactivate] - [:nous, :workflow, :run, :*], [:nous, :workflow, :node, :*] Dropped [:nous, :provider, :stream, :chunk] from docs and attach_default_handler — it was never emitted, and a per-chunk telemetry call would be high-overhead on the hot streaming path. Use [:nous, :provider, :stream, :start]/:connected/ :exception for stream lifecycle. 1754 tests passing.

…context load - memory/embedding/bumblebee.ex: stop serializing every embedding through the one ServingHolder GenServer. handle_call/3 no longer runs Nx.Serving.run/2; it returns the serving struct via :get_serving, and callers run inference themselves. Nx.Serving is designed to batch concurrent calls, so this restores the parallelism the prior design bottlenecked. - agent_registry.ex: bump Registry partitions to System.schedulers_online() (was the default :1). High-concurrency LiveView fan-ins called lookup/1 from many sockets and serialized on a single partition. - agent_server.ex: defer maybe_load_context to a handle_continue so init/1 returns immediately. Persistence I/O no longer blocks DynamicSupervisor.start_child — which means Teams.Coordinator's spawn_agent handle_call no longer wedges the team coordinator for seconds when the persistence backend is slow (S3, Postgres). 1754 tests passing, clean compile.

Req's :into callback pushed chunks to the consumer via send/2, so a fast LLM + slow consumer (LiveView fan-out, persistence-per-chunk, slow IO) grew the consumer's mailbox without bound — the M-12 risk called out in mix.exs. The producing Task now polls the consumer's message_queue_len before each send: - below @backpressure_high_water (1_000): forward chunk normally - above: busy-wait in 5ms increments until queue drops below @backpressure_low_water (100), then resume - still backed up after @backpressure_max_wait_ms (30s): surface {:error, %{reason: :backpressure_overflow, queue_len: n}} and halt rather than wedging forever This pauses Req's :into callback while we wait, which transitively pauses the producing socket — natural pull-based backpressure on top of the push-based default. Users with reliably slow consumers can still opt into Hackney's strict :async-once mode. 1754 tests passing.

@deprecated

- Mark @deprecated on public functions with no internal callers, kept for backward compat per user direction (not removed since this is the 0.x hex line and downstream consumers may rely on them): Nous.ToolSchema.to_openai/1 — use Nous.Tool.to_openai_schema/1 Nous.Agent.tool/3 — use Agent.new/2 with :tools or build %Tool{} directly Nous.Eval.run!/2 — match Nous.Eval.run/2's {:ok, _} | {:error, _} Nous.Decisions.path_between/4, descendants/3, ancestors/3 — call store_mod.query directly - memory/store/sqlite.ex: FTS5 query escaping now doubles embedded `"` per FTS5 inside-quotes rules. A search term like `say "hi"` previously produced invalid FTS5 syntax and errored. 1754 tests passing, --warnings-as-errors clean.

nyo16 added 11 commits May 16, 2026 21:12

docs(CHANGELOG): document audit-driven fixes under Unreleased

9baa035

nyo16 merged commit 46f26bf into master May 17, 2026
6 checks passed

nyo16 deleted the worktree-audit-fixes branch May 17, 2026 01:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit-driven fixes: provider marshalling, ETS lifecycle, OTP hygiene, telemetry#58

Audit-driven fixes: provider marshalling, ETS lifecycle, OTP hygiene, telemetry#58
nyo16 merged 11 commits into
masterfrom
worktree-audit-fixes

nyo16 commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyo16 commented May 17, 2026

Summary

Critical bugs fixed

Behavior-changing decisions

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant