Security, bug, performance & docs audit pass (RCE gate, SSRF, sandbox + 90 more findings)#59
Merged
Merged
Conversation
Critical/high security findings from the audit: - Tool.from_module/2 dropped requires_approval from metadata, silently disabling the agent_runner approval gate for Bash/FileWrite (one prompt injection from RCE). Fall back to metadata like name/description do. - Tool.Behaviour.implements?/1 used function_exported?/3 without loading the module first, so from_module spuriously rejected built-in tools by load order. Guard with Code.ensure_loaded?. - Agent.parse_tools/1 now accepts bare behaviour modules (tools: [Tools.Bash]), routing through from_module (preserves metadata flags; matches the docs). - FileGrep passed LLM-controlled pattern/glob into ripgrep with no '--' terminator -> rg flag injection (-f/--pre) escaping the workspace. Use --regexp/--glob and terminate options before the positional path. Also re-validate each matched file in the Elixir fallback (mirrors FileGlob). - PathGuard only lstat'd the final path component, so an intermediate directory symlink escaped the jail. Resolve symlinks across every existing component (realpath) and compare canonical path vs canonical root. - UrlGuard: normalize IPv4-mapped IPv6 + NAT64 to the v4 blocklist, block fe80::/10 and ::, and resolve BOTH A and AAAA (dual-stack bypass). Add validate_pinned/2 returning a validated IP; web_fetch now pins the connection to that IP (original hostname kept for Host/SNI/cert), closing the DNS-rebinding TOCTOU. - Req provider backends (one-shot + streaming) set redirect: false so a compromised upstream can't bounce requests to internal/metadata addresses. Regression tests added for every fix.
Authorization / guard enforcement (audit medium findings): - Nous.Permissions was dead code. Add an optional :permissions Policy to Agent, filter blocked tools out of the model's tool set in agent_runner, and force the approval gate for policy-approval-required tools. Fix blocked?/2 to honor allow lists in ALL modes (deny-by-default), validate Policy.mode, and fail closed (block / require-approval) on unknown modes. - InputGuard never ran on the streaming path: run_stream now executes the plugin init + before_request pipeline and short-circuits to a terminal 'blocked' stream without calling the model. - LLMJudge: fence untrusted input in a random boundary, parse only the first VERDICT line, and honor on_error (fail-closed) on unparseable responses; truncate the stored raw_response. - InputGuard :majority/:all aggregation now uses the configured strategy count as the denominator so killing strategies can't flip the vote. - Pattern strategy NFKC-normalizes and strips zero-width/bidi chars before matching (defeats homoglyph/zero-width evasion). - Policy :warn/:block sanitize and fence the (possibly LLM-derived) reason before embedding it in a system/assistant message. Secret & data-exposure hygiene: - Context.serialize_deps redacts credential-shaped keys (api_key/token/secret/ password/authorization/...) so persistence never writes them. - Gemini sends the API key via the x-goog-api-key header, not the URL query string (URLs leak into logs/proxies/spans). - Default telemetry handler logs a bounded status+body summary, not the raw error term (full upstream body/headers). - Streaming Req backend caps non-2xx error-body buffering at max_buffer_size. - Hook.new/2 accepts :fail_closed so security-gating hooks can opt in. - Persistence.ETS + Workflow.Checkpoint.ETS tables are now :protected (owner writes via GenServer, any process reads) instead of :public; add ETS.clear/0. - Summarization omits raw tool-result content from the durable summary. Regression tests added across these paths.
…rkflow
Crash / wrong-behavior fixes (audit high/medium):
- Anthropic from_response: join multiple text/thinking blocks instead of
returning a list into the :string content field (was Ecto.InvalidChangesetError
on common multi-block responses). Mirrors the Gemini fix.
- OpenAI parse_tool_call: default a missing "function" wrapper to %{} (was
BadMapError aborting the whole response parse on non-conformant backends).
- AgentServer no longer adds the user message to context twice per turn.
- Nous.LLM streaming-with-tools reassembles fragments via ToolCallAccumulator
(was: Access crash on OpenAI list shape, nil-arg tool calls on Anthropic).
- Streaming accumulator carries Gemini/Vertex thought_signature metadata.
- ToolExecutor catches throw/non-timeout exit -> retryable ToolError instead of
crashing the whole agent run.
- TeamTools resolves shared_state/coordinator passed as a registered NAME to a
live pid (region locking & discovery sharing were silently disabled).
- SQLite memory search_text scope placeholder off-by-one fixed.
- Hybrid store over-fetches a larger candidate pool when scoped.
- Memory search normalizes RRF scores to 0-1 (consistent min_score behavior).
- Per-run :model_settings override is now applied.
- RateLimiter wired into the agent request path (rpm/tpm/request enforced).
- Parallel executor preserves branch_id/index on crash/timeout.
Lower-severity robustness: SearchScrape processes all URLs (capped/throttled) +
clamps opts; eval/config env_integer via Integer.parse and estimate_cost deep
merge; tool validator recurses into nested objects/arrays.
Regression tests added across all of the above.
- RateLimiter.apply_delta prepends to the sliding window instead of `++ [entry]` (O(1) vs O(n); window is order-independent). - SharedState.share_discovery prepends discoveries (get_discoveries reverses to keep insertion order), removing the O(n^2) tail-append. - KnowledgeBase ETS store keeps a slug->id index so fetch_entry_by_slug is O(1) instead of a full tab2list scan + struct rebuild on every kb_read/backlinks/ link tool call. Index maintained on store/update/delete. - Decisions ETS store builds an edge adjacency index ONCE per BFS traversal (descendants/ancestors/path_between) instead of scanning the whole edge table per visited node — O(V+E) instead of O(V*E). Regression tests added for slug-index consistency on update/delete.
- mix.exs: licenses ["MIT"] -> ["Apache-2.0"] to match the bundled LICENSE and
README badge.
- README + Memory plugin moduledoc: move plugin :deps from Nous.new/2 (where it
is silently dropped — the Agent struct has no :deps field) to Nous.run/3, for
the Memory, Knowledge Base, and Sub-Agent examples.
- README: scope the streaming-backpressure claim (Req default; Hackney opt-in).
- docs/getting-started.md:
- Nous.ProviderError/ModelError -> Nous.Errors.* (the bare names don't exist
and the retry example would not compile).
- AgentDynamicSupervisor.start_agent("id", agent_config_map, opts) — correct
arity/types; drop the ignored :name option.
- restore via Nous.Agent.Context.deserialize/1 (load returns a serialized map).
- ChatBot GenServer example uses %Nous.Message{} structs + the :messages key
(a bare list of role/content maps returns {:error, :invalid_input}).
- AGENTS.md:
- custom-tool example uses @behaviour + metadata/0 + execute(ctx, args)
(the documented `use Nous.Tool` / reversed args don't compile); bare tool
modules in tools: now work (parse_tools converts them).
- correct the "streaming always uses hackney pull mode / not configurable"
claim (Req is the default since 0.15.4; Hackney is opt-in).
…& blocked stream Message.extract_text/1 has no nil-content clause (it would raise on a tool-call-only assistant message) and is spec'd String.t(), so the `|| ""` fallbacks were both dead code (dialyzer guard_fail) and unsafe. Match binary content directly in blocked_stream/1 and estimate_request_tokens/1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A multi-agent audit of the framework surfaced 98 confirmed findings (after an adversarial-verification pass dropped 7 false positives). This PR fixes them across
security, correctness, performance, and docs — in 5 themed commits plus a dialyzer-clean follow-up — with a regression test for every fix.
Verification (all green):
mix test— 1806 passed, 0 failures (+52 new regression tests over the 1754 baseline)mix compile --warnings-as-errors— cleanmix credo --strict— no issues on changed filesmix dialyzer— 0 errorsSecurity
Tool.from_module/2droppedrequires_approvalfrom metadata, soBash/FileWriteran without the human-approval gate. Now defaultsfrom metadata. (Also:
Tool.Behaviour.implements?/1ensures the module is loaded;Agent.new/2accepts bare tool modules.)-f/--pre) — now uses--regexp/--glob+--terminator; Elixir fallback re-validates each matched file.fe80::/10/::, resolves A and AAAA, andWebFetchpins the validated IP (Host/SNI preserved) to closethe DNS-rebinding TOCTOU. Provider Req backends set
redirect: false.:permissionsoption onAgent.new/2filters blocked tools and forces approval;blocked?/2allow-lists are deny-by-default in allmodes; unknown modes fail closed.
run_stream/3runs the guard pipeline and blocks before any LLM call; LLMJudge fences input + can fail closed; Pattern normalizesUnicode; aggregation counts configured strategies.
depskeys not persisted; Gemini key viax-goog-api-keyheader; telemetry logs a bounded summary; persistence/checkpoint ETStables are
:protected.Correctness
functioncrash, AgentServer double-message,Nous.LLMstreaming-tools reassembly, Geminithought_signature, ToolExecutorthrow/exit, teams atom-vs-pid region locking, SQLite scope off-by-one, Hybrid scope-after-limit, RRF normalization,
:model_settingsoverride, RateLimiter wiring,parallel branch-id attribution, nested-schema validation, eval-config robustness, SearchScrape URL handling.
Performance
++appends (RateLimiter window, SharedState discoveries); KBslug → idindex (O(1) lookup); Decisions BFS adjacency index (O(V·E) → O(V+E)).Docs
deps:→run/3, getting-startedNous.Errors.*/start_agent/3/Context.deserialize/1/ chatbotmessages:, AGENTS.mdcustom-tool + streaming claims),
mix.exslicenseMIT→Apache-2.0, and aCHANGELOGentry under[Unreleased].Notes for reviewers
Permissions.blocked?/2allow-list is now deny-by-default in every mode; persistence/checkpoint ETS tables are:protected(writesroute through the owner — added
Nous.Persistence.ETS.clear/0); Gemini auth moved from URL query to header.moduledoc.
0.16.1; entries land inCHANGELOG's[Unreleased]per the existing release convention.