Skip to content

Audit-driven hardening: security fixes, dep hygiene, perf, +75 tests#61

Merged
nyo16 merged 3 commits into
masterfrom
fix/audit-fixes-2026-06
Jun 13, 2026
Merged

Audit-driven hardening: security fixes, dep hygiene, perf, +75 tests#61
nyo16 merged 3 commits into
masterfrom
fix/audit-fixes-2026-06

Conversation

@nyo16

@nyo16 nyo16 commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Summary

Implements the fix plan derived from a full-codebase health audit, plus a
streaming/tool-call hardening review. One HIGH security finding, several
medium/low fixes, BEAM-level performance work, dependency hygiene, and a large
test-coverage expansion.

Full suite green: 1881 passed (+75 over baseline), 0 failures. mix format

  • mix credo --strict clean.

Security

  • Fix HIGH approval-gate bypass. In AgentRunner, the pre_tool_use
    {:modify, …} hook branch skipped enforce_policy_approval/2, so a tool gated
    only by the permission policy (:strict / approval_required /
    execute-category) executed ungated whenever any hook rewrote its arguments.
    The modify branch now applies policy approval identically to the normal path.
    (+ regression test)
  • InputGuard fails closed on dropped strategies. A timed-out/errored strategy
    under the default :any aggregation no longer passes as :safe — it upgrades
    to :suspicious (configurable fail_closed, telemetry + log, new
    strategy_timeout).
  • :permissive policy no longer auto-approves execute-class tools.
    category: :execute tools (e.g. bash) keep their approval gate under
    :permissive unless allow_unattended_execute: true (new
    requires_approval?/3).
  • SQL identifier-injection primitive removed. field_to_column/1 is now a
    strict column allowlist in the SQLite/DuckDB memory stores.
  • mix nous.optimize stops eval’ing --params. Accepts safe YAML/JSON data;
    Code.eval_file is now an explicit, warned .exs fallback only.

Dependencies

  • Purge 36 stale mix.lock entries; add mix deps.unlock --check-unused to CI.
  • Loosen net_runner (~> 1.0) and req (~> 0.5 or ~> 0.6); make
    phoenix_pubsub optional: true so its constraint reaches downstream.
  • Guard the optional hackney crash paths (backend selection + pool config).

Performance (BEAM hot paths)

  • Context.add_messages/2: single concat, O(n+m) instead of O(n·m).
  • PubSub: memoize Code.ensure_loaded?(Phoenix.PubSub) in :persistent_term
    (was running per broadcast — i.e. per streamed token).
  • Hybrid memory search: run the embedding round-trip concurrently with the text
    scan (store access stays single-process — safe for SQL backends).
  • SSE parsing: :binary.split instead of a per-chunk regex split.

Tests (+75)

  • New coverage: OpenAI request marshalling, InputGuard fail-closed,
    permissions category-gate, optimizer data params, Context.add_messages
    equivalence, output_schema one_of error path, semantic input-guard
    strategy, ParallelExecutor, teams Supervisor, research Planner +
    Synthesizer, and a reusable Nous.MemoryStoreConformance harness (wired to
    ETS; native backends adopt it behind a tag in CI).
  • Removed flaky/redundant Process.sleep calls in rate_limiter, workflow
    state, and phase3 tests; fixed the self-contradicting one_of test.

See CHANGELOG.md for the user-facing security entries.

Deferred (intentionally out of scope — each documented)

  • Duplicate PubSub delivery (AgentServer): real, but the server subscribes
    to the topic it broadcasts on — a "pick one path" change risks a latent
    self-broadcast loop. Needs a reproduce-first test with real Phoenix.PubSub.
  • Native-dep work: llamacpp provider/normalizer tests, native optional-dep
    declarations, and dependency version bumps belong in a CI job where the
    C++/Rust/NIF/XLA builds actually run.
  • Low-severity perf tweaks (SharedState reads, checkpoint keying, concurrent
    tool execution, etc.): real tradeoffs; a focused perf session.
  • Research Searcher unit tests: web-tool-bound (needs tool mocking).

Notes for reviewers

  • mix dialyzer was not run (PLT rebuild); worth running before merge given the
    type-touching changes (agent_runner, permissions, the new policy field).
  • Three behavior changes ship with CHANGELOG entries: the approval-bypass fix,
    InputGuard fail-closed default, and the :permissive execute-gate.

nyo16 added 3 commits June 12, 2026 08:09
Implements the fix plan derived from a full-codebase health audit, plus a
streaming/tool-call hardening review. Full suite green (1881 passed, 0
failures); format + credo --strict clean.

Security
- Fix HIGH approval-gate bypass: the pre_tool_use `{:modify}` hook branch in
  AgentRunner skipped `enforce_policy_approval`, so a tool gated only by the
  permission policy (strict / approval_required / execute-category) ran ungated
  whenever a hook rewrote its arguments. Now applies policy approval on that
  path too. (+regression test)
- InputGuard fails closed on dropped strategies: a timed-out/errored strategy
  under the default `:any` aggregation no longer passes as `:safe`; it upgrades
  to `:suspicious` (configurable `fail_closed`, telemetry + log, new
  `strategy_timeout`).
- Permissions: `:permissive` mode no longer auto-approves `category: :execute`
  tools unless `allow_unattended_execute: true` (new `requires_approval?/3`).
- Memory stores: `field_to_column/1` is now a strict column allowlist in the
  SQLite/DuckDB backends (removes a SQL-identifier injection primitive).
- mix nous.optimize: `--params` accepts safe YAML/JSON data; `Code.eval_file`
  is now an explicit, warned `.exs` fallback only.

Dependencies
- Purge 36 stale mix.lock entries; add `mix deps.unlock --check-unused` to CI.
- Loosen net_runner (`~> 1.0`) and req (`~> 0.5 or ~> 0.6`); make phoenix_pubsub
  `optional: true`.
- Guard the hackney optional-dep crash paths (backend selection + pool config).

Performance (BEAM hot paths)
- Context.add_messages/2: single concat, O(n+m) instead of O(n*m).
- PubSub: memoize `Code.ensure_loaded?(Phoenix.PubSub)` in :persistent_term
  (was per-broadcast / per-streamed-token).
- Hybrid memory search: run the embedding round-trip concurrently with the text
  scan (store access stays single-process).
- SSE parsing: `:binary.split` instead of a per-chunk regex split.

Tests (+75)
- New: OpenAI request marshalling, InputGuard fail-closed, permissions
  category-gate, optimizer data params, Context.add_messages equivalence,
  output_schema one_of error path, semantic input-guard strategy,
  ParallelExecutor, teams Supervisor, research Planner + Synthesizer, and a
  reusable Nous.MemoryStoreConformance harness (wired to the ETS backend;
  native backends adopt it behind a tag in CI).
- Remove flaky/redundant sleeps in rate_limiter, workflow state, and phase3
  tests; fix the self-contradicting one_of test.

See CHANGELOG.md for the user-facing security entries.
Enables the LlamaCpp NIF provider and adds an end-to-end smoke test against
real GGUF models. Picks up the previously-deferred P4-T4 (llamacpp coverage).

- mix.exs: add {:llama_cpp_ex, "~> 0.8", optional: true} (optional so it stays
  out of downstream builds unless opted in, but available for Nous's dev/test).
  mix.lock adds only llama_cpp_ex 0.8.22 + fine; unrelated transitive bumps
  (ecto 3.14, req 0.6, telemetry 1.4) that `deps.update` tried to drag in were
  isolated out — ecto 3.14 trims whitespace in cast :empty_values and breaks the
  ContentPart "\n\n\n" regression test, so that update belongs in the deliberate
  dependency-bump pass.

- providers/llamacpp.ex: fix two latent warnings exposed now that the module
  actually compiles (it was compiled-out without the NIF):
  * do_request_stream/3 had dead {:ok,_}/{:error,_} clauses on
    stream_chat_completion, which is spec'd `:: Enumerable.t()` (raw stream,
    errors surface during enumeration). Simplified to match the real contract.
  * dropped the unused hand-written build_request_params/3 stub — the
    macro-generated default fills the overridable slot and is exempt from the
    unused-function warning. Compiles clean under --warnings-as-errors.

- test/nous/providers/llamacpp_smoke_test.exs (@moduletag :llama, excluded by
  default; test_helper excludes it): chat completion (generate_text + agent
  loop), enable_thinking:false suppresses <think>, json_schema structured output
  (tolerant JSON extraction — small models wrap output in fences), an
  agent-with-tool completes (documents that llama_cpp_ex has no native tools
  API; grammar/json_schema is its structured mechanism), and embeddings via
  embed/3. Reads NOUS_LLAMACPP_TEST_MODEL / NOUS_LLAMACPP_TEST_EMBED_MODEL;
  skips cleanly (single placeholder) when the model/NIF is absent so the default
  suite and CI never fail. Verified 7/7 passing across repeated runs against
  Qwen3.5-0.8B + Qwen3-Embedding-0.6B with Metal.
`use Nous.Provider` injects `@dialyzer {:nowarn_function, build_request_params: 3}`,
but the provider overrides request/3 + request_stream/3, so the macro's default
build_request_params/3 is dead-code-eliminated when unused — leaving the dialyzer
directive dangling ("Unknown function build_request_params/3"). Defining it as a
private stub instead trips the compiler's unused-function warning (and @compile
nowarn_unused_function doesn't cover Elixir's own check). Resolve the catch-22
with a `@doc false` public stub: public functions aren't unused-warned and stay
in the BEAM, so the @dialyzer directive resolves.

`mix dialyzer` now passes (0 errors). Note: the local PLT under priv/plts is
gitignored and was rebuilt for OTP 29 (the committed-era PLT was OTP-incompatible
and raised "Old PLT file"); no repo artifact changes.
@nyo16 nyo16 merged commit 96d9826 into master Jun 13, 2026
6 checks passed
@nyo16 nyo16 deleted the fix/audit-fixes-2026-06 branch June 13, 2026 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant