Audit-driven fixes: provider marshalling, ETS lifecycle, OTP hygiene, telemetry#58
Merged
Conversation
…, usage parsing
- Gemini functionResponse.name must equal the original functionCall.name;
was leaking tool_call_id (e.g. "gemini_abc123") and breaking every
Gemini/Vertex tool roundtrip. Now uses Message.name with tool_call_id
fallback; agent_runner + llm + context all thread name through
Message.tool/3.
- OpenAI tool-call arguments JSON-decode failure no longer injects a fake
args map (%{"error" => ..., "raw" => ...}) into the tool. decode_arguments/1
now returns {:ok, map()} | {:error, {:invalid_json, raw}}; parsers tag the
call with "_invalid_arguments" and agent_runner short-circuits with a
proper tool-error result so the LLM can retry.
- Streaming Gemini tool calls now synthesize a "gemini_<base64>" id matching
the non-streaming parser instead of always emitting id: nil.
- Anthropic + Gemini usage parsing set requests: 1 (was 0) so final usage
metrics aren't undercounted. Anthropic captures cache_creation_input_tokens
and cache_read_input_tokens; Gemini captures cachedContentTokenCount. New
Usage struct fields propagate through add/2.
- Comment why two Gemini parsers exist (parse_content vs parse_parts) to
prevent future "consolidate these" refactors that would break callers.
1743 tests passing (was 1737, +6 new tests).
…n re-init - Nous.Workflow.Checkpoint.ETS gains a supervised TableOwner GenServer started under Nous.Application. Previously the :nous_workflow_checkpoints table was owned by whichever caller first invoked save/load — when that process died, the table died and init/0 silently recreated an empty one, losing every suspended workflow that depended on resume. Added a regression test that saves from a transient Task and asserts the data survives. - Nous.Plugins.Memory.init/2 is called on every agent run by AgentRunner.run_init. Previously it called store_mod.init/1 unconditionally, creating a fresh ETS table per run and silently discarding the prior one (under load: ets_too_many_tables). Now reuses store_state when already set; extracted apply_defaults/1 so the per-run defaults still get refreshed. Added a regression test asserting the second init returns the same store_state. - Added a regression test that runs a workflow with scratch: true where a node raises, and asserts the :nous_scratch_* table is cleaned up. The executor's existing exception catch already handles this, but the test pins the contract. 1746 tests passing (was 1743, +3 new tests).
Migrate bare Task.async / Task.async_stream callsites to the application's
Nous.TaskSupervisor with the *_nolink variants — so a research/eval/scrape
crash no longer crashes the calling process (and vice versa), and the
supervisor can send graceful exits on app shutdown.
- research/coordinator.ex: top-level research task + parallel search stream
- eval/runner.ex: parallel suite run + per-test-case timeout task
- tools/search_scrape.ex: URL fan-out
- plugins/input_guard.ex: parallel strategy run
- http/stream_backend/req.ex: SSE producer task (the default streaming
backend)
Task.yield now handles {:exit, reason} as {:error, {:task_exit, reason}}
instead of crashing the caller with CaseClauseError (coordinator.ex,
eval/runner.ex).
Compile warnings cleared:
- providers/vertex_ai.ex: drop unreachable validate_project_id(nil)
clause (caller already guards with `if project`).
- research/planner.ex: tighten @SPEC to {:ok, plan()}; the LLM-error
branch falls back to a single-step plan and never returns {:error, _}.
- research/coordinator.ex: drop dead {:error, _} clause in research_loop/1
now that plan_phase/1 only returns {:ok, _, _}.
mix compile --warnings-as-errors is clean in both dev and test envs.
1746 tests passing.
- AgentServer.init/1 subscribed to "agent:#{session_id}" while the public
helper Nous.PubSub.agent_topic/1 returns "nous:agent:#{session_id}".
Anyone publishing via the helper never reached the server. Use the
helper.
- AgentServer.terminate/2 now cancels state.current_task on shutdown
(set the cancelled atomic + Task.shutdown). Previously the in-flight
LLM stream would keep consuming tokens and HTTP connections after the
server was already gone, until the runner hit max_iterations.
- Replace `_ -> :ok` swallow on Task.shutdown with explicit clauses;
log {:exit, reason} crashes during reset instead of silently
discarding them.
…tor constraint composition
- Nous.Hook gains a fail_closed: boolean() field. When set on a hook bound
to a blocking event (:pre_tool_use, :pre_request), runtime errors (raised
exceptions, timeouts, non-0/2 command exit codes) now :deny instead of
silently failing open. Default remains false for backward compatibility;
set fail_closed: true on hooks that gate security-sensitive operations.
- Nous.Tool.Validator.validate_types/2 previously matched the property
schema's "type" clause before "enum", silently dropping any declared
enum constraint when both keys were present (e.g. %{"type" => "string",
"enum" => ["a","b"]} accepted any string). Now every constraint runs
via maybe_check_type/4 + maybe_check_enum/4 so a value that violates
both produces two errors and a value that violates only one still
fails.
1753 tests passing (was 1746, +7 new tests).
Nous.LLM.stream_text_with_tools/6 silently :halt'd its Stream.resource when
Fallback.with_fallback returned {:error, _}. Consumers iterating the stream
saw a clean empty stream with no signal that the LLM call had failed,
making error handling impossible for the streaming + tools path.
Now emits an {:error, reason} event before halting, matching the contract
consumers expect from any Stream-based provider. Regression test added.
…ve stale doc Documented but never emitted (now emitted): - [:nous, :agent, :iteration, :start/:stop] — fires around each do_iteration in AgentRunner with iteration, max_iterations, tool_calls, needs_response. - [:nous, :context, :update] — fires from Tool.ContextUpdate.apply/2 with keys_updated count and the list of keys that changed. - [:nous, :callback, :execute] — fires from Nous.Agent.Callbacks.execute/3 with the callback_type and agent_name. Emitted but undocumented (now documented under their own sections): - [:nous, :agent, :fallback, :used], [:nous, :fallback, :activated] - [:nous, :hook, :execute, :start/:stop], [:nous, :hook, :denied] - [:nous, :skill, :activate/:deactivate] - [:nous, :workflow, :run, :*], [:nous, :workflow, :node, :*] Dropped [:nous, :provider, :stream, :chunk] from docs and attach_default_handler — it was never emitted, and a per-chunk telemetry call would be high-overhead on the hot streaming path. Use [:nous, :provider, :stream, :start]/:connected/ :exception for stream lifecycle. 1754 tests passing.
…context load - memory/embedding/bumblebee.ex: stop serializing every embedding through the one ServingHolder GenServer. handle_call/3 no longer runs Nx.Serving.run/2; it returns the serving struct via :get_serving, and callers run inference themselves. Nx.Serving is designed to batch concurrent calls, so this restores the parallelism the prior design bottlenecked. - agent_registry.ex: bump Registry partitions to System.schedulers_online() (was the default :1). High-concurrency LiveView fan-ins called lookup/1 from many sockets and serialized on a single partition. - agent_server.ex: defer maybe_load_context to a handle_continue so init/1 returns immediately. Persistence I/O no longer blocks DynamicSupervisor.start_child — which means Teams.Coordinator's spawn_agent handle_call no longer wedges the team coordinator for seconds when the persistence backend is slow (S3, Postgres). 1754 tests passing, clean compile.
Req's :into callback pushed chunks to the consumer via send/2, so a fast
LLM + slow consumer (LiveView fan-out, persistence-per-chunk, slow IO)
grew the consumer's mailbox without bound — the M-12 risk called out in
mix.exs.
The producing Task now polls the consumer's message_queue_len before each
send:
- below @backpressure_high_water (1_000): forward chunk normally
- above: busy-wait in 5ms increments until queue drops below
@backpressure_low_water (100), then resume
- still backed up after @backpressure_max_wait_ms (30s): surface
{:error, %{reason: :backpressure_overflow, queue_len: n}} and halt
rather than wedging forever
This pauses Req's :into callback while we wait, which transitively pauses
the producing socket — natural pull-based backpressure on top of the
push-based default. Users with reliably slow consumers can still opt
into Hackney's strict :async-once mode.
1754 tests passing.
- Mark @deprecated on public functions with no internal callers, kept for backward compat per user direction (not removed since this is the 0.x hex line and downstream consumers may rely on them): Nous.ToolSchema.to_openai/1 — use Nous.Tool.to_openai_schema/1 Nous.Agent.tool/3 — use Agent.new/2 with :tools or build %Tool{} directly Nous.Eval.run!/2 — match Nous.Eval.run/2's {:ok, _} | {:error, _} Nous.Decisions.path_between/4, descendants/3, ancestors/3 — call store_mod.query directly - memory/store/sqlite.ex: FTS5 query escaping now doubles embedded `"` per FTS5 inside-quotes rules. A search term like `say "hi"` previously produced invalid FTS5 syntax and errored. 1754 tests passing, --warnings-as-errors clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive bug-fix pass driven by a parallel multi-agent code audit (dead code / leaks / broken code / OTP concurrency lenses). Eleven logically-grouped commits, all
individually verified with TDD where applicable.
1754 tests passing (was 1737 — 17 new regression tests).
mix compile --warnings-as-errorsis clean in both dev and test envs.1c1253c3a8cc9ed75dd6bTask.async→Task.Supervisor.async_nolink; clear compile warningsea99c5d993af69fail_closed:opt-in + tool validator enum/type composition3fba62eLLM.stream_text_with_toolsemits{:error, _}instead of silent halt4410fd537002c323fca7365b7648@deprecatedon orphan public API + SQLite FTS5 escape9baa035See
CHANGELOG.md"Unreleased" for the full per-fix breakdown.Critical bugs fixed
functionResponse.namecarried thetool_call_idinstead of the function name. Fixed by threading:namethrough
Message.tool/3.TableOwner.store_state.Nous.PubSub.agent_topic/1never reached it.Behavior-changing decisions
@deprecated(no removal in 0.x).Test plan
mix test— 1754 / 1754 passing (3 doctests + 1751 tests, 101 excluded), ~4.2smix compile --warnings-as-errors— clean in dev envMIX_ENV=test mix compile --warnings-as-errors— clean in test envfail_closed,scratch cleanup, llm error emit)
functionResponse.namefix is the highest-impact item)