Claude/fix glasswally agent glitch#38
Merged
Merged
Conversation
Behavioral fingerprint probes lived only in source, so an adversary reading the repo could craft a model that passes exactly them. Adds two controls, both pinned per-baseline so drift detection stays comparable and existing baselines need no migration: - MODEL_GUARD_PROBES_FILE: a validated out-of-band probe set that fully replaces the built-ins. Fails closed in production if unloadable — never silently degrades to the publicly-known probes. - MODEL_GUARD_PROBE_COUNT: fingerprint with a crypto-random k-of-n probe subset; the selection is recorded in the baseline and reused on every subsequent check. A baseline whose pinned probes are absent from the active pool fails closed rather than emitting an incomparable fingerprint. https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
The swarm mesh let any node on the segment inject as a peer (discovery announcements were entirely unauthenticated), and the message HMAC covered only id:from:to:timestamp — not type or payload — so a captured signed message could be tampered and still verify. There was no replay defense. With meshSecret set: - Discovery announcements must carry a fresh, non-replayed HMAC proof of the deployment secret (domain-separated from message signatures); unauthenticated/forged/stale/replayed enrollments are rejected and surfaced via mesh:peer:rejected. - Message signatures now cover a canonical envelope including type and a hash of payload, removing the ':'-join ambiguity. - Per-message and per-announcement nonces with a freshness window give bounded replay protection. Backward compatible: without meshSecret the mesh stays open for dev but logs a loud warning. Full X.509 mTLS remains the documented ideal. https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
…m memory Long-term memory ranked purely by keyword overlap with no notion of trust, so crafted content could score top relevance on legitimate queries and be injected into prompts (only output was sanitized, after retrieval). - Store-time injection detection: content with injection patterns is written with low trust and flagged (not refused), emitting memory:longterm:poisoning_suspected. - Trust-weighted retrieval: relevance is multiplied by per-entry trust; flagged or sub-floor entries are excluded. Centralized in LongTermMemory.search so it covers keyword and semantic paths; legacy entries without trust default above the floor (back-compat). - Bounded keyword-stuffing breadth heuristic: an entry that near-exactly matches more than N distinct queries is flagged and excluded, emitting memory:longterm:poisoning_detected. Security trust/flag fields cannot be overridden via caller metadata. Semantic provenance scoring remains the documented open problem. https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
API keys lived in process.env readable by any in-process code including plugins. Adds a sealed in-memory secrets provider: at finalizeStartup (prod-gated via NODE_ENV=production or EOS_SEAL_SECRETS=1) credentials are captured and deleted from process.env, then served only through the gated getSecret()/requireSecret(). Non-sealed keys fall through to the prior provider; dev/test are unaffected (no-op unless enabled). LLM provider constructors, the Glasswally IOC secret, and the Discord LLM fallbacks now resolve via getSecret() so they keep working after the environment is sealed (HMAC secrets are already captured into module consts at import, before finalization). External secrets manager / HSM remains the documented ideal. https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
README Known Limitations now reflects reality: formal threat model exists (docs/STRIDE.md, resolved); single-process honestly described (HIGH-tier already worker-thread isolated, residual gap noted); Glasswally bullet notes the D-6 line-buffer DoS is now actually enforced. The mesh/credentials/probes/memory bullets were updated alongside their fixes in prior commits. Also isolates AUDIT_LOG_PATH / AGENT_REVOCATION_LOG / DECISION_LEDGER_PATH / MODEL_GUARD_DIR per jest worker via a setupFiles script. Parallel workers previously appended to the same on-disk audit log, intermittently corrupting the hash chain that the e2e suite verifies. Per-worker temp paths remove the race without serializing tests. No production code change. https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
Merged
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.