Audit-driven hardening: security fixes, dep hygiene, perf, +75 tests#61
Merged
Conversation
Implements the fix plan derived from a full-codebase health audit, plus a
streaming/tool-call hardening review. Full suite green (1881 passed, 0
failures); format + credo --strict clean.
Security
- Fix HIGH approval-gate bypass: the pre_tool_use `{:modify}` hook branch in
AgentRunner skipped `enforce_policy_approval`, so a tool gated only by the
permission policy (strict / approval_required / execute-category) ran ungated
whenever a hook rewrote its arguments. Now applies policy approval on that
path too. (+regression test)
- InputGuard fails closed on dropped strategies: a timed-out/errored strategy
under the default `:any` aggregation no longer passes as `:safe`; it upgrades
to `:suspicious` (configurable `fail_closed`, telemetry + log, new
`strategy_timeout`).
- Permissions: `:permissive` mode no longer auto-approves `category: :execute`
tools unless `allow_unattended_execute: true` (new `requires_approval?/3`).
- Memory stores: `field_to_column/1` is now a strict column allowlist in the
SQLite/DuckDB backends (removes a SQL-identifier injection primitive).
- mix nous.optimize: `--params` accepts safe YAML/JSON data; `Code.eval_file`
is now an explicit, warned `.exs` fallback only.
Dependencies
- Purge 36 stale mix.lock entries; add `mix deps.unlock --check-unused` to CI.
- Loosen net_runner (`~> 1.0`) and req (`~> 0.5 or ~> 0.6`); make phoenix_pubsub
`optional: true`.
- Guard the hackney optional-dep crash paths (backend selection + pool config).
Performance (BEAM hot paths)
- Context.add_messages/2: single concat, O(n+m) instead of O(n*m).
- PubSub: memoize `Code.ensure_loaded?(Phoenix.PubSub)` in :persistent_term
(was per-broadcast / per-streamed-token).
- Hybrid memory search: run the embedding round-trip concurrently with the text
scan (store access stays single-process).
- SSE parsing: `:binary.split` instead of a per-chunk regex split.
Tests (+75)
- New: OpenAI request marshalling, InputGuard fail-closed, permissions
category-gate, optimizer data params, Context.add_messages equivalence,
output_schema one_of error path, semantic input-guard strategy,
ParallelExecutor, teams Supervisor, research Planner + Synthesizer, and a
reusable Nous.MemoryStoreConformance harness (wired to the ETS backend;
native backends adopt it behind a tag in CI).
- Remove flaky/redundant sleeps in rate_limiter, workflow state, and phase3
tests; fix the self-contradicting one_of test.
See CHANGELOG.md for the user-facing security entries.
Enables the LlamaCpp NIF provider and adds an end-to-end smoke test against
real GGUF models. Picks up the previously-deferred P4-T4 (llamacpp coverage).
- mix.exs: add {:llama_cpp_ex, "~> 0.8", optional: true} (optional so it stays
out of downstream builds unless opted in, but available for Nous's dev/test).
mix.lock adds only llama_cpp_ex 0.8.22 + fine; unrelated transitive bumps
(ecto 3.14, req 0.6, telemetry 1.4) that `deps.update` tried to drag in were
isolated out — ecto 3.14 trims whitespace in cast :empty_values and breaks the
ContentPart "\n\n\n" regression test, so that update belongs in the deliberate
dependency-bump pass.
- providers/llamacpp.ex: fix two latent warnings exposed now that the module
actually compiles (it was compiled-out without the NIF):
* do_request_stream/3 had dead {:ok,_}/{:error,_} clauses on
stream_chat_completion, which is spec'd `:: Enumerable.t()` (raw stream,
errors surface during enumeration). Simplified to match the real contract.
* dropped the unused hand-written build_request_params/3 stub — the
macro-generated default fills the overridable slot and is exempt from the
unused-function warning. Compiles clean under --warnings-as-errors.
- test/nous/providers/llamacpp_smoke_test.exs (@moduletag :llama, excluded by
default; test_helper excludes it): chat completion (generate_text + agent
loop), enable_thinking:false suppresses <think>, json_schema structured output
(tolerant JSON extraction — small models wrap output in fences), an
agent-with-tool completes (documents that llama_cpp_ex has no native tools
API; grammar/json_schema is its structured mechanism), and embeddings via
embed/3. Reads NOUS_LLAMACPP_TEST_MODEL / NOUS_LLAMACPP_TEST_EMBED_MODEL;
skips cleanly (single placeholder) when the model/NIF is absent so the default
suite and CI never fail. Verified 7/7 passing across repeated runs against
Qwen3.5-0.8B + Qwen3-Embedding-0.6B with Metal.
`use Nous.Provider` injects `@dialyzer {:nowarn_function, build_request_params: 3}`,
but the provider overrides request/3 + request_stream/3, so the macro's default
build_request_params/3 is dead-code-eliminated when unused — leaving the dialyzer
directive dangling ("Unknown function build_request_params/3"). Defining it as a
private stub instead trips the compiler's unused-function warning (and @compile
nowarn_unused_function doesn't cover Elixir's own check). Resolve the catch-22
with a `@doc false` public stub: public functions aren't unused-warned and stay
in the BEAM, so the @dialyzer directive resolves.
`mix dialyzer` now passes (0 errors). Note: the local PLT under priv/plts is
gitignored and was rebuilt for OTP 29 (the committed-era PLT was OTP-incompatible
and raised "Old PLT file"); no repo artifact changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the fix plan derived from a full-codebase health audit, plus a
streaming/tool-call hardening review. One HIGH security finding, several
medium/low fixes, BEAM-level performance work, dependency hygiene, and a large
test-coverage expansion.
Full suite green: 1881 passed (+75 over baseline), 0 failures.
mix formatmix credo --strictclean.Security
AgentRunner, thepre_tool_use{:modify, …}hook branch skippedenforce_policy_approval/2, so a tool gatedonly by the permission policy (
:strict/approval_required/execute-category) executed ungated whenever any hook rewrote its arguments.
The modify branch now applies policy approval identically to the normal path.
(+ regression test)
under the default
:anyaggregation no longer passes as:safe— it upgradesto
:suspicious(configurablefail_closed, telemetry + log, newstrategy_timeout).:permissivepolicy no longer auto-approves execute-class tools.category: :executetools (e.g.bash) keep their approval gate under:permissiveunlessallow_unattended_execute: true(newrequires_approval?/3).field_to_column/1is now astrict column allowlist in the SQLite/DuckDB memory stores.
mix nous.optimizestops eval’ing--params. Accepts safe YAML/JSON data;Code.eval_fileis now an explicit, warned.exsfallback only.Dependencies
mix.lockentries; addmix deps.unlock --check-unusedto CI.net_runner(~> 1.0) andreq(~> 0.5 or ~> 0.6); makephoenix_pubsuboptional: trueso its constraint reaches downstream.hackneycrash paths (backend selection + pool config).Performance (BEAM hot paths)
Context.add_messages/2: single concat, O(n+m) instead of O(n·m).Code.ensure_loaded?(Phoenix.PubSub)in:persistent_term(was running per broadcast — i.e. per streamed token).
scan (store access stays single-process — safe for SQL backends).
:binary.splitinstead of a per-chunk regex split.Tests (+75)
permissions category-gate, optimizer data params,
Context.add_messagesequivalence,
output_schemaone_oferror path, semantic input-guardstrategy,
ParallelExecutor, teamsSupervisor, researchPlanner+Synthesizer, and a reusableNous.MemoryStoreConformanceharness (wired toETS; native backends adopt it behind a tag in CI).
Process.sleepcalls inrate_limiter, workflowstate, andphase3tests; fixed the self-contradictingone_oftest.See
CHANGELOG.mdfor the user-facing security entries.Deferred (intentionally out of scope — each documented)
AgentServer): real, but the server subscribesto the topic it broadcasts on — a "pick one path" change risks a latent
self-broadcast loop. Needs a reproduce-first test with real
Phoenix.PubSub.declarations, and dependency version bumps belong in a CI job where the
C++/Rust/NIF/XLA builds actually run.
tool execution, etc.): real tradeoffs; a focused perf session.
Searcherunit tests: web-tool-bound (needs tool mocking).Notes for reviewers
mix dialyzerwas not run (PLT rebuild); worth running before merge given thetype-touching changes (
agent_runner,permissions, the new policy field).InputGuard fail-closed default, and the
:permissiveexecute-gate.