Skip to content

Security, bug, performance & docs audit pass (RCE gate, SSRF, sandbox + 90 more findings)#59

Merged
nyo16 merged 7 commits into
masterfrom
audit/security-bugs-perf-docs
May 29, 2026
Merged

Security, bug, performance & docs audit pass (RCE gate, SSRF, sandbox + 90 more findings)#59
nyo16 merged 7 commits into
masterfrom
audit/security-bugs-perf-docs

Conversation

@nyo16

@nyo16 nyo16 commented May 29, 2026

Copy link
Copy Markdown
Owner

Summary

A multi-agent audit of the framework surfaced 98 confirmed findings (after an adversarial-verification pass dropped 7 false positives). This PR fixes them across
security, correctness, performance, and docs — in 5 themed commits plus a dialyzer-clean follow-up — with a regression test for every fix.

Verification (all green):

  • mix test1806 passed, 0 failures (+52 new regression tests over the 1754 baseline)
  • mix compile --warnings-as-errors — clean
  • mix credo --strict — no issues on changed files
  • mix dialyzer — 0 errors

Security

  • RCE approval gate bypassTool.from_module/2 dropped requires_approval from metadata, so Bash/FileWrite ran without the human-approval gate. Now defaults
    from metadata. (Also: Tool.Behaviour.implements?/1 ensures the module is loaded; Agent.new/2 accepts bare tool modules.)
  • FileGrep ripgrep flag injection (-f/--pre) — now uses --regexp/--glob + -- terminator; Elixir fallback re-validates each matched file.
  • PathGuard intermediate-symlink escape — only the leaf was checked; now resolves symlinks across every component (realpath) and compares canonical paths.
  • UrlGuard SSRF — blocks IPv4-mapped IPv6 / NAT64 / fe80::/10 / ::, resolves A and AAAA, and WebFetch pins the validated IP (Host/SNI preserved) to close
    the DNS-rebinding TOCTOU. Provider Req backends set redirect: false.
  • Permissions enforced — new :permissions option on Agent.new/2 filters blocked tools and forces approval; blocked?/2 allow-lists are deny-by-default in all
    modes; unknown modes fail closed.
  • InputGuard on the streaming pathrun_stream/3 runs the guard pipeline and blocks before any LLM call; LLMJudge fences input + can fail closed; Pattern normalizes
    Unicode; aggregation counts configured strategies.
  • Secret hygiene — credential-shaped deps keys not persisted; Gemini key via x-goog-api-key header; telemetry logs a bounded summary; persistence/checkpoint ETS
    tables are :protected.

Correctness

  • Anthropic multi-block crash, OpenAI nil-function crash, AgentServer double-message, Nous.LLM streaming-tools reassembly, Gemini thought_signature, ToolExecutor
    throw/exit, teams atom-vs-pid region locking, SQLite scope off-by-one, Hybrid scope-after-limit, RRF normalization, :model_settings override, RateLimiter wiring,
    parallel branch-id attribution, nested-schema validation, eval-config robustness, SearchScrape URL handling.

Performance

  • Removed O(n²) ++ appends (RateLimiter window, SharedState discoveries); KB slug → id index (O(1) lookup); Decisions BFS adjacency index (O(V·E) → O(V+E)).

Docs

  • Fixed non-compiling examples (README deps:run/3, getting-started Nous.Errors.* / start_agent/3 / Context.deserialize/1 / chatbot messages:, AGENTS.md
    custom-tool + streaming claims), mix.exs license MITApache-2.0, and a CHANGELOG entry under [Unreleased].

Notes for reviewers

  • Behavior changes worth a look: Permissions.blocked?/2 allow-list is now deny-by-default in every mode; persistence/checkpoint ETS tables are :protected (writes
    route through the owner — added Nous.Persistence.ETS.clear/0); Gemini auth moved from URL query to header.
  • Rate limiter: rpm/tpm/request limits are now enforced; the cost budget is reconciled post-hoc (the runtime has no per-token cost model) — documented in the
    moduledoc.
  • Version left at 0.16.1; entries land in CHANGELOG's [Unreleased] per the existing release convention.

nyo16 added 7 commits May 29, 2026 16:51
Critical/high security findings from the audit:

- Tool.from_module/2 dropped requires_approval from metadata, silently
  disabling the agent_runner approval gate for Bash/FileWrite (one prompt
  injection from RCE). Fall back to metadata like name/description do.
- Tool.Behaviour.implements?/1 used function_exported?/3 without loading the
  module first, so from_module spuriously rejected built-in tools by load
  order. Guard with Code.ensure_loaded?.
- Agent.parse_tools/1 now accepts bare behaviour modules (tools: [Tools.Bash]),
  routing through from_module (preserves metadata flags; matches the docs).
- FileGrep passed LLM-controlled pattern/glob into ripgrep with no '--'
  terminator -> rg flag injection (-f/--pre) escaping the workspace. Use
  --regexp/--glob and terminate options before the positional path. Also
  re-validate each matched file in the Elixir fallback (mirrors FileGlob).
- PathGuard only lstat'd the final path component, so an intermediate
  directory symlink escaped the jail. Resolve symlinks across every existing
  component (realpath) and compare canonical path vs canonical root.
- UrlGuard: normalize IPv4-mapped IPv6 + NAT64 to the v4 blocklist, block
  fe80::/10 and ::, and resolve BOTH A and AAAA (dual-stack bypass). Add
  validate_pinned/2 returning a validated IP; web_fetch now pins the
  connection to that IP (original hostname kept for Host/SNI/cert), closing
  the DNS-rebinding TOCTOU.
- Req provider backends (one-shot + streaming) set redirect: false so a
  compromised upstream can't bounce requests to internal/metadata addresses.

Regression tests added for every fix.
Authorization / guard enforcement (audit medium findings):

- Nous.Permissions was dead code. Add an optional :permissions Policy to Agent,
  filter blocked tools out of the model's tool set in agent_runner, and force
  the approval gate for policy-approval-required tools. Fix blocked?/2 to honor
  allow lists in ALL modes (deny-by-default), validate Policy.mode, and fail
  closed (block / require-approval) on unknown modes.
- InputGuard never ran on the streaming path: run_stream now executes the
  plugin init + before_request pipeline and short-circuits to a terminal
  'blocked' stream without calling the model.
- LLMJudge: fence untrusted input in a random boundary, parse only the first
  VERDICT line, and honor on_error (fail-closed) on unparseable responses;
  truncate the stored raw_response.
- InputGuard :majority/:all aggregation now uses the configured strategy count
  as the denominator so killing strategies can't flip the vote.
- Pattern strategy NFKC-normalizes and strips zero-width/bidi chars before
  matching (defeats homoglyph/zero-width evasion).
- Policy :warn/:block sanitize and fence the (possibly LLM-derived) reason
  before embedding it in a system/assistant message.

Secret & data-exposure hygiene:

- Context.serialize_deps redacts credential-shaped keys (api_key/token/secret/
  password/authorization/...) so persistence never writes them.
- Gemini sends the API key via the x-goog-api-key header, not the URL query
  string (URLs leak into logs/proxies/spans).
- Default telemetry handler logs a bounded status+body summary, not the raw
  error term (full upstream body/headers).
- Streaming Req backend caps non-2xx error-body buffering at max_buffer_size.
- Hook.new/2 accepts :fail_closed so security-gating hooks can opt in.
- Persistence.ETS + Workflow.Checkpoint.ETS tables are now :protected (owner
  writes via GenServer, any process reads) instead of :public; add ETS.clear/0.
- Summarization omits raw tool-result content from the durable summary.

Regression tests added across these paths.
…rkflow

Crash / wrong-behavior fixes (audit high/medium):

- Anthropic from_response: join multiple text/thinking blocks instead of
  returning a list into the :string content field (was Ecto.InvalidChangesetError
  on common multi-block responses). Mirrors the Gemini fix.
- OpenAI parse_tool_call: default a missing "function" wrapper to %{} (was
  BadMapError aborting the whole response parse on non-conformant backends).
- AgentServer no longer adds the user message to context twice per turn.
- Nous.LLM streaming-with-tools reassembles fragments via ToolCallAccumulator
  (was: Access crash on OpenAI list shape, nil-arg tool calls on Anthropic).
- Streaming accumulator carries Gemini/Vertex thought_signature metadata.
- ToolExecutor catches throw/non-timeout exit -> retryable ToolError instead of
  crashing the whole agent run.
- TeamTools resolves shared_state/coordinator passed as a registered NAME to a
  live pid (region locking & discovery sharing were silently disabled).
- SQLite memory search_text scope placeholder off-by-one fixed.
- Hybrid store over-fetches a larger candidate pool when scoped.
- Memory search normalizes RRF scores to 0-1 (consistent min_score behavior).
- Per-run :model_settings override is now applied.
- RateLimiter wired into the agent request path (rpm/tpm/request enforced).
- Parallel executor preserves branch_id/index on crash/timeout.

Lower-severity robustness: SearchScrape processes all URLs (capped/throttled) +
clamps opts; eval/config env_integer via Integer.parse and estimate_cost deep
merge; tool validator recurses into nested objects/arrays.

Regression tests added across all of the above.
- RateLimiter.apply_delta prepends to the sliding window instead of `++ [entry]`
  (O(1) vs O(n); window is order-independent).
- SharedState.share_discovery prepends discoveries (get_discoveries reverses to
  keep insertion order), removing the O(n^2) tail-append.
- KnowledgeBase ETS store keeps a slug->id index so fetch_entry_by_slug is O(1)
  instead of a full tab2list scan + struct rebuild on every kb_read/backlinks/
  link tool call. Index maintained on store/update/delete.
- Decisions ETS store builds an edge adjacency index ONCE per BFS traversal
  (descendants/ancestors/path_between) instead of scanning the whole edge table
  per visited node — O(V+E) instead of O(V*E).

Regression tests added for slug-index consistency on update/delete.
- mix.exs: licenses ["MIT"] -> ["Apache-2.0"] to match the bundled LICENSE and
  README badge.
- README + Memory plugin moduledoc: move plugin :deps from Nous.new/2 (where it
  is silently dropped — the Agent struct has no :deps field) to Nous.run/3, for
  the Memory, Knowledge Base, and Sub-Agent examples.
- README: scope the streaming-backpressure claim (Req default; Hackney opt-in).
- docs/getting-started.md:
  - Nous.ProviderError/ModelError -> Nous.Errors.* (the bare names don't exist
    and the retry example would not compile).
  - AgentDynamicSupervisor.start_agent("id", agent_config_map, opts) — correct
    arity/types; drop the ignored :name option.
  - restore via Nous.Agent.Context.deserialize/1 (load returns a serialized map).
  - ChatBot GenServer example uses %Nous.Message{} structs + the :messages key
    (a bare list of role/content maps returns {:error, :invalid_input}).
- AGENTS.md:
  - custom-tool example uses @behaviour + metadata/0 + execute(ctx, args)
    (the documented `use Nous.Tool` / reversed args don't compile); bare tool
    modules in tools: now work (parse_tools converts them).
  - correct the "streaming always uses hackney pull mode / not configurable"
    claim (Req is the default since 0.15.4; Hackney is opt-in).
…& blocked stream

Message.extract_text/1 has no nil-content clause (it would raise on a
tool-call-only assistant message) and is spec'd String.t(), so the `|| ""`
fallbacks were both dead code (dialyzer guard_fail) and unsafe. Match binary
content directly in blocked_stream/1 and estimate_request_tokens/1.
@nyo16 nyo16 merged commit 4ef8d70 into master May 29, 2026
6 checks passed
@nyo16 nyo16 deleted the audit/security-bugs-perf-docs branch May 29, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant