Security, bug, performance & docs audit pass (RCE gate, SSRF, sandbox + 90 more findings) by nyo16 · Pull Request #59 · nyo16/nous

nyo16 · 2026-05-29T23:20:47Z

Summary

A multi-agent audit of the framework surfaced 98 confirmed findings (after an adversarial-verification pass dropped 7 false positives). This PR fixes them across
security, correctness, performance, and docs — in 5 themed commits plus a dialyzer-clean follow-up — with a regression test for every fix.

Verification (all green):

mix test — 1806 passed, 0 failures (+52 new regression tests over the 1754 baseline)
mix compile --warnings-as-errors — clean
mix credo --strict — no issues on changed files
mix dialyzer — 0 errors

Security

RCE approval gate bypass — Tool.from_module/2 dropped requires_approval from metadata, so Bash/FileWrite ran without the human-approval gate. Now defaults
from metadata. (Also: Tool.Behaviour.implements?/1 ensures the module is loaded; Agent.new/2 accepts bare tool modules.)
FileGrep ripgrep flag injection (-f/--pre) — now uses --regexp/--glob + -- terminator; Elixir fallback re-validates each matched file.
PathGuard intermediate-symlink escape — only the leaf was checked; now resolves symlinks across every component (realpath) and compares canonical paths.
UrlGuard SSRF — blocks IPv4-mapped IPv6 / NAT64 / fe80::/10 / ::, resolves A and AAAA, and WebFetch pins the validated IP (Host/SNI preserved) to close
the DNS-rebinding TOCTOU. Provider Req backends set redirect: false.
Permissions enforced — new :permissions option on Agent.new/2 filters blocked tools and forces approval; blocked?/2 allow-lists are deny-by-default in all
modes; unknown modes fail closed.
InputGuard on the streaming path — run_stream/3 runs the guard pipeline and blocks before any LLM call; LLMJudge fences input + can fail closed; Pattern normalizes
Unicode; aggregation counts configured strategies.
Secret hygiene — credential-shaped deps keys not persisted; Gemini key via x-goog-api-key header; telemetry logs a bounded summary; persistence/checkpoint ETS
tables are :protected.

Correctness

Anthropic multi-block crash, OpenAI nil-function crash, AgentServer double-message, Nous.LLM streaming-tools reassembly, Gemini thought_signature, ToolExecutor
throw/exit, teams atom-vs-pid region locking, SQLite scope off-by-one, Hybrid scope-after-limit, RRF normalization, :model_settings override, RateLimiter wiring,
parallel branch-id attribution, nested-schema validation, eval-config robustness, SearchScrape URL handling.

Performance

Removed O(n²) ++ appends (RateLimiter window, SharedState discoveries); KB slug → id index (O(1) lookup); Decisions BFS adjacency index (O(V·E) → O(V+E)).

Docs

Fixed non-compiling examples (README deps: → run/3, getting-started Nous.Errors.* / start_agent/3 / Context.deserialize/1 / chatbot messages:, AGENTS.md
custom-tool + streaming claims), mix.exs license MIT → Apache-2.0, and a CHANGELOG entry under [Unreleased].

Notes for reviewers

Behavior changes worth a look: Permissions.blocked?/2 allow-list is now deny-by-default in every mode; persistence/checkpoint ETS tables are :protected (writes
route through the owner — added Nous.Persistence.ETS.clear/0); Gemini auth moved from URL query to header.
Rate limiter: rpm/tpm/request limits are now enforced; the cost budget is reconciled post-hoc (the runtime has no per-token cost model) — documented in the
moduledoc.
Version left at 0.16.1; entries land in CHANGELOG's [Unreleased] per the existing release convention.

Critical/high security findings from the audit: - Tool.from_module/2 dropped requires_approval from metadata, silently disabling the agent_runner approval gate for Bash/FileWrite (one prompt injection from RCE). Fall back to metadata like name/description do. - Tool.Behaviour.implements?/1 used function_exported?/3 without loading the module first, so from_module spuriously rejected built-in tools by load order. Guard with Code.ensure_loaded?. - Agent.parse_tools/1 now accepts bare behaviour modules (tools: [Tools.Bash]), routing through from_module (preserves metadata flags; matches the docs). - FileGrep passed LLM-controlled pattern/glob into ripgrep with no '--' terminator -> rg flag injection (-f/--pre) escaping the workspace. Use --regexp/--glob and terminate options before the positional path. Also re-validate each matched file in the Elixir fallback (mirrors FileGlob). - PathGuard only lstat'd the final path component, so an intermediate directory symlink escaped the jail. Resolve symlinks across every existing component (realpath) and compare canonical path vs canonical root. - UrlGuard: normalize IPv4-mapped IPv6 + NAT64 to the v4 blocklist, block fe80::/10 and ::, and resolve BOTH A and AAAA (dual-stack bypass). Add validate_pinned/2 returning a validated IP; web_fetch now pins the connection to that IP (original hostname kept for Host/SNI/cert), closing the DNS-rebinding TOCTOU. - Req provider backends (one-shot + streaming) set redirect: false so a compromised upstream can't bounce requests to internal/metadata addresses. Regression tests added for every fix.

Authorization / guard enforcement (audit medium findings): - Nous.Permissions was dead code. Add an optional :permissions Policy to Agent, filter blocked tools out of the model's tool set in agent_runner, and force the approval gate for policy-approval-required tools. Fix blocked?/2 to honor allow lists in ALL modes (deny-by-default), validate Policy.mode, and fail closed (block / require-approval) on unknown modes. - InputGuard never ran on the streaming path: run_stream now executes the plugin init + before_request pipeline and short-circuits to a terminal 'blocked' stream without calling the model. - LLMJudge: fence untrusted input in a random boundary, parse only the first VERDICT line, and honor on_error (fail-closed) on unparseable responses; truncate the stored raw_response. - InputGuard :majority/:all aggregation now uses the configured strategy count as the denominator so killing strategies can't flip the vote. - Pattern strategy NFKC-normalizes and strips zero-width/bidi chars before matching (defeats homoglyph/zero-width evasion). - Policy :warn/:block sanitize and fence the (possibly LLM-derived) reason before embedding it in a system/assistant message. Secret & data-exposure hygiene: - Context.serialize_deps redacts credential-shaped keys (api_key/token/secret/ password/authorization/...) so persistence never writes them. - Gemini sends the API key via the x-goog-api-key header, not the URL query string (URLs leak into logs/proxies/spans). - Default telemetry handler logs a bounded status+body summary, not the raw error term (full upstream body/headers). - Streaming Req backend caps non-2xx error-body buffering at max_buffer_size. - Hook.new/2 accepts :fail_closed so security-gating hooks can opt in. - Persistence.ETS + Workflow.Checkpoint.ETS tables are now :protected (owner writes via GenServer, any process reads) instead of :public; add ETS.clear/0. - Summarization omits raw tool-result content from the durable summary. Regression tests added across these paths.

…rkflow Crash / wrong-behavior fixes (audit high/medium): - Anthropic from_response: join multiple text/thinking blocks instead of returning a list into the :string content field (was Ecto.InvalidChangesetError on common multi-block responses). Mirrors the Gemini fix. - OpenAI parse_tool_call: default a missing "function" wrapper to %{} (was BadMapError aborting the whole response parse on non-conformant backends). - AgentServer no longer adds the user message to context twice per turn. - Nous.LLM streaming-with-tools reassembles fragments via ToolCallAccumulator (was: Access crash on OpenAI list shape, nil-arg tool calls on Anthropic). - Streaming accumulator carries Gemini/Vertex thought_signature metadata. - ToolExecutor catches throw/non-timeout exit -> retryable ToolError instead of crashing the whole agent run. - TeamTools resolves shared_state/coordinator passed as a registered NAME to a live pid (region locking & discovery sharing were silently disabled). - SQLite memory search_text scope placeholder off-by-one fixed. - Hybrid store over-fetches a larger candidate pool when scoped. - Memory search normalizes RRF scores to 0-1 (consistent min_score behavior). - Per-run :model_settings override is now applied. - RateLimiter wired into the agent request path (rpm/tpm/request enforced). - Parallel executor preserves branch_id/index on crash/timeout. Lower-severity robustness: SearchScrape processes all URLs (capped/throttled) + clamps opts; eval/config env_integer via Integer.parse and estimate_cost deep merge; tool validator recurses into nested objects/arrays. Regression tests added across all of the above.

- RateLimiter.apply_delta prepends to the sliding window instead of `++ [entry]` (O(1) vs O(n); window is order-independent). - SharedState.share_discovery prepends discoveries (get_discoveries reverses to keep insertion order), removing the O(n^2) tail-append. - KnowledgeBase ETS store keeps a slug->id index so fetch_entry_by_slug is O(1) instead of a full tab2list scan + struct rebuild on every kb_read/backlinks/ link tool call. Index maintained on store/update/delete. - Decisions ETS store builds an edge adjacency index ONCE per BFS traversal (descendants/ancestors/path_between) instead of scanning the whole edge table per visited node — O(V+E) instead of O(V*E). Regression tests added for slug-index consistency on update/delete.

@behaviour

- mix.exs: licenses ["MIT"] -> ["Apache-2.0"] to match the bundled LICENSE and README badge. - README + Memory plugin moduledoc: move plugin :deps from Nous.new/2 (where it is silently dropped — the Agent struct has no :deps field) to Nous.run/3, for the Memory, Knowledge Base, and Sub-Agent examples. - README: scope the streaming-backpressure claim (Req default; Hackney opt-in). - docs/getting-started.md: - Nous.ProviderError/ModelError -> Nous.Errors.* (the bare names don't exist and the retry example would not compile). - AgentDynamicSupervisor.start_agent("id", agent_config_map, opts) — correct arity/types; drop the ignored :name option. - restore via Nous.Agent.Context.deserialize/1 (load returns a serialized map). - ChatBot GenServer example uses %Nous.Message{} structs + the :messages key (a bare list of role/content maps returns {:error, :invalid_input}). - AGENTS.md: - custom-tool example uses @behaviour + metadata/0 + execute(ctx, args) (the documented `use Nous.Tool` / reversed args don't compile); bare tool modules in tools: now work (parse_tools converts them). - correct the "streaming always uses hackney pull mode / not configurable" claim (Req is the default since 0.15.4; Hackney is opt-in).

…& blocked stream Message.extract_text/1 has no nil-content clause (it would raise on a tool-call-only assistant message) and is spec'd String.t(), so the `|| ""` fallbacks were both dead code (dialyzer guard_fail) and unsafe. Match binary content directly in blocked_stream/1 and estimate_request_tokens/1.

nyo16 added 7 commits May 29, 2026 16:51

docs: changelog entries for the security/bug/perf/docs audit pass

c46ee72

nyo16 merged commit 4ef8d70 into master May 29, 2026
6 checks passed

nyo16 deleted the audit/security-bugs-perf-docs branch May 29, 2026 23:34

nyo16 mentioned this pull request Jun 5, 2026

Audit-pass follow-up: security/OTP/test hardening #60

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security, bug, performance & docs audit pass (RCE gate, SSRF, sandbox + 90 more findings)#59

Security, bug, performance & docs audit pass (RCE gate, SSRF, sandbox + 90 more findings)#59
nyo16 merged 7 commits into
masterfrom
audit/security-bugs-perf-docs

nyo16 commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyo16 commented May 29, 2026

Summary

Security

Correctness

Performance

Docs

Notes for reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant