Audit-driven hardening: security fixes, dep hygiene, perf, +75 tests by nyo16 · Pull Request #61 · nyo16/nous

nyo16 · 2026-06-13T01:33:42Z

Summary

Implements the fix plan derived from a full-codebase health audit, plus a
streaming/tool-call hardening review. One HIGH security finding, several
medium/low fixes, BEAM-level performance work, dependency hygiene, and a large
test-coverage expansion.

Full suite green: 1881 passed (+75 over baseline), 0 failures. mix format

mix credo --strict clean.

Security

Fix HIGH approval-gate bypass. In AgentRunner, the pre_tool_use
{:modify, …} hook branch skipped enforce_policy_approval/2, so a tool gated
only by the permission policy (:strict / approval_required /
execute-category) executed ungated whenever any hook rewrote its arguments.
The modify branch now applies policy approval identically to the normal path.
(+ regression test)
InputGuard fails closed on dropped strategies. A timed-out/errored strategy
under the default :any aggregation no longer passes as :safe — it upgrades
to :suspicious (configurable fail_closed, telemetry + log, new
strategy_timeout).
:permissive policy no longer auto-approves execute-class tools.
category: :execute tools (e.g. bash) keep their approval gate under
:permissive unless allow_unattended_execute: true (new
requires_approval?/3).
SQL identifier-injection primitive removed. field_to_column/1 is now a
strict column allowlist in the SQLite/DuckDB memory stores.
mix nous.optimize stops eval’ing --params. Accepts safe YAML/JSON data;
Code.eval_file is now an explicit, warned .exs fallback only.

Dependencies

Purge 36 stale mix.lock entries; add mix deps.unlock --check-unused to CI.
Loosen net_runner (~> 1.0) and req (~> 0.5 or ~> 0.6); make
phoenix_pubsub optional: true so its constraint reaches downstream.
Guard the optional hackney crash paths (backend selection + pool config).

Performance (BEAM hot paths)

Context.add_messages/2: single concat, O(n+m) instead of O(n·m).
PubSub: memoize Code.ensure_loaded?(Phoenix.PubSub) in :persistent_term
(was running per broadcast — i.e. per streamed token).
Hybrid memory search: run the embedding round-trip concurrently with the text
scan (store access stays single-process — safe for SQL backends).
SSE parsing: :binary.split instead of a per-chunk regex split.

Tests (+75)

New coverage: OpenAI request marshalling, InputGuard fail-closed,
permissions category-gate, optimizer data params, Context.add_messages
equivalence, output_schema one_of error path, semantic input-guard
strategy, ParallelExecutor, teams Supervisor, research Planner +
Synthesizer, and a reusable Nous.MemoryStoreConformance harness (wired to
ETS; native backends adopt it behind a tag in CI).
Removed flaky/redundant Process.sleep calls in rate_limiter, workflow
state, and phase3 tests; fixed the self-contradicting one_of test.

See CHANGELOG.md for the user-facing security entries.

Deferred (intentionally out of scope — each documented)

Duplicate PubSub delivery (AgentServer): real, but the server subscribes
to the topic it broadcasts on — a "pick one path" change risks a latent
self-broadcast loop. Needs a reproduce-first test with real Phoenix.PubSub.
Native-dep work: llamacpp provider/normalizer tests, native optional-dep
declarations, and dependency version bumps belong in a CI job where the
C++/Rust/NIF/XLA builds actually run.
Low-severity perf tweaks (SharedState reads, checkpoint keying, concurrent
tool execution, etc.): real tradeoffs; a focused perf session.
Research Searcher unit tests: web-tool-bound (needs tool mocking).

Notes for reviewers

mix dialyzer was not run (PLT rebuild); worth running before merge given the
type-touching changes (agent_runner, permissions, the new policy field).
Three behavior changes ship with CHANGELOG entries: the approval-bypass fix,
InputGuard fail-closed default, and the :permissive execute-gate.

Implements the fix plan derived from a full-codebase health audit, plus a streaming/tool-call hardening review. Full suite green (1881 passed, 0 failures); format + credo --strict clean. Security - Fix HIGH approval-gate bypass: the pre_tool_use `{:modify}` hook branch in AgentRunner skipped `enforce_policy_approval`, so a tool gated only by the permission policy (strict / approval_required / execute-category) ran ungated whenever a hook rewrote its arguments. Now applies policy approval on that path too. (+regression test) - InputGuard fails closed on dropped strategies: a timed-out/errored strategy under the default `:any` aggregation no longer passes as `:safe`; it upgrades to `:suspicious` (configurable `fail_closed`, telemetry + log, new `strategy_timeout`). - Permissions: `:permissive` mode no longer auto-approves `category: :execute` tools unless `allow_unattended_execute: true` (new `requires_approval?/3`). - Memory stores: `field_to_column/1` is now a strict column allowlist in the SQLite/DuckDB backends (removes a SQL-identifier injection primitive). - mix nous.optimize: `--params` accepts safe YAML/JSON data; `Code.eval_file` is now an explicit, warned `.exs` fallback only. Dependencies - Purge 36 stale mix.lock entries; add `mix deps.unlock --check-unused` to CI. - Loosen net_runner (`~> 1.0`) and req (`~> 0.5 or ~> 0.6`); make phoenix_pubsub `optional: true`. - Guard the hackney optional-dep crash paths (backend selection + pool config). Performance (BEAM hot paths) - Context.add_messages/2: single concat, O(n+m) instead of O(n*m). - PubSub: memoize `Code.ensure_loaded?(Phoenix.PubSub)` in :persistent_term (was per-broadcast / per-streamed-token). - Hybrid memory search: run the embedding round-trip concurrently with the text scan (store access stays single-process). - SSE parsing: `:binary.split` instead of a per-chunk regex split. Tests (+75) - New: OpenAI request marshalling, InputGuard fail-closed, permissions category-gate, optimizer data params, Context.add_messages equivalence, output_schema one_of error path, semantic input-guard strategy, ParallelExecutor, teams Supervisor, research Planner + Synthesizer, and a reusable Nous.MemoryStoreConformance harness (wired to the ETS backend; native backends adopt it behind a tag in CI). - Remove flaky/redundant sleeps in rate_limiter, workflow state, and phase3 tests; fix the self-contradicting one_of test. See CHANGELOG.md for the user-facing security entries.

Enables the LlamaCpp NIF provider and adds an end-to-end smoke test against real GGUF models. Picks up the previously-deferred P4-T4 (llamacpp coverage). - mix.exs: add {:llama_cpp_ex, "~> 0.8", optional: true} (optional so it stays out of downstream builds unless opted in, but available for Nous's dev/test). mix.lock adds only llama_cpp_ex 0.8.22 + fine; unrelated transitive bumps (ecto 3.14, req 0.6, telemetry 1.4) that `deps.update` tried to drag in were isolated out — ecto 3.14 trims whitespace in cast :empty_values and breaks the ContentPart "\n\n\n" regression test, so that update belongs in the deliberate dependency-bump pass. - providers/llamacpp.ex: fix two latent warnings exposed now that the module actually compiles (it was compiled-out without the NIF): * do_request_stream/3 had dead {:ok,_}/{:error,_} clauses on stream_chat_completion, which is spec'd `:: Enumerable.t()` (raw stream, errors surface during enumeration). Simplified to match the real contract. * dropped the unused hand-written build_request_params/3 stub — the macro-generated default fills the overridable slot and is exempt from the unused-function warning. Compiles clean under --warnings-as-errors. - test/nous/providers/llamacpp_smoke_test.exs (@moduletag :llama, excluded by default; test_helper excludes it): chat completion (generate_text + agent loop), enable_thinking:false suppresses <think>, json_schema structured output (tolerant JSON extraction — small models wrap output in fences), an agent-with-tool completes (documents that llama_cpp_ex has no native tools API; grammar/json_schema is its structured mechanism), and embeddings via embed/3. Reads NOUS_LLAMACPP_TEST_MODEL / NOUS_LLAMACPP_TEST_EMBED_MODEL; skips cleanly (single placeholder) when the model/NIF is absent so the default suite and CI never fail. Verified 7/7 passing across repeated runs against Qwen3.5-0.8B + Qwen3-Embedding-0.6B with Metal.

@compile

`use Nous.Provider` injects `@dialyzer {:nowarn_function, build_request_params: 3}`, but the provider overrides request/3 + request_stream/3, so the macro's default build_request_params/3 is dead-code-eliminated when unused — leaving the dialyzer directive dangling ("Unknown function build_request_params/3"). Defining it as a private stub instead trips the compiler's unused-function warning (and @compile nowarn_unused_function doesn't cover Elixir's own check). Resolve the catch-22 with a `@doc false` public stub: public functions aren't unused-warned and stay in the BEAM, so the @dialyzer directive resolves. `mix dialyzer` now passes (0 errors). Note: the local PLT under priv/plts is gitignored and was rebuilt for OTP 29 (the committed-era PLT was OTP-incompatible and raised "Old PLT file"); no repo artifact changes.

nyo16 added 3 commits June 12, 2026 08:09

nyo16 merged commit 96d9826 into master Jun 13, 2026
6 checks passed

nyo16 deleted the fix/audit-fixes-2026-06 branch June 13, 2026 03:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit-driven hardening: security fixes, dep hygiene, perf, +75 tests#61

Audit-driven hardening: security fixes, dep hygiene, perf, +75 tests#61
nyo16 merged 3 commits into
masterfrom
fix/audit-fixes-2026-06

nyo16 commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyo16 commented Jun 13, 2026

Summary

Security

Dependencies

Performance (BEAM hot paths)

Tests (+75)

Deferred (intentionally out of scope — each documented)

Notes for reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant