Skip to content

Claude/fix glasswally agent glitch#38

Merged
noisyloop merged 5 commits into
mainfrom
claude/fix-glasswally-agent-UVhSV
May 18, 2026
Merged

Claude/fix glasswally agent glitch#38
noisyloop merged 5 commits into
mainfrom
claude/fix-glasswally-agent-UVhSV

Conversation

@noisyloop

Copy link
Copy Markdown
Owner

No description provided.

claude added 5 commits May 18, 2026 03:05
Behavioral fingerprint probes lived only in source, so an adversary
reading the repo could craft a model that passes exactly them. Adds two
controls, both pinned per-baseline so drift detection stays comparable
and existing baselines need no migration:

- MODEL_GUARD_PROBES_FILE: a validated out-of-band probe set that fully
  replaces the built-ins. Fails closed in production if unloadable —
  never silently degrades to the publicly-known probes.
- MODEL_GUARD_PROBE_COUNT: fingerprint with a crypto-random k-of-n probe
  subset; the selection is recorded in the baseline and reused on every
  subsequent check.

A baseline whose pinned probes are absent from the active pool fails
closed rather than emitting an incomparable fingerprint.

https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
The swarm mesh let any node on the segment inject as a peer (discovery
announcements were entirely unauthenticated), and the message HMAC
covered only id:from:to:timestamp — not type or payload — so a captured
signed message could be tampered and still verify. There was no replay
defense.

With meshSecret set:
- Discovery announcements must carry a fresh, non-replayed HMAC proof of
  the deployment secret (domain-separated from message signatures);
  unauthenticated/forged/stale/replayed enrollments are rejected and
  surfaced via mesh:peer:rejected.
- Message signatures now cover a canonical envelope including type and a
  hash of payload, removing the ':'-join ambiguity.
- Per-message and per-announcement nonces with a freshness window give
  bounded replay protection.

Backward compatible: without meshSecret the mesh stays open for dev but
logs a loud warning. Full X.509 mTLS remains the documented ideal.

https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
…m memory

Long-term memory ranked purely by keyword overlap with no notion of
trust, so crafted content could score top relevance on legitimate
queries and be injected into prompts (only output was sanitized, after
retrieval).

- Store-time injection detection: content with injection patterns is
  written with low trust and flagged (not refused), emitting
  memory:longterm:poisoning_suspected.
- Trust-weighted retrieval: relevance is multiplied by per-entry trust;
  flagged or sub-floor entries are excluded. Centralized in
  LongTermMemory.search so it covers keyword and semantic paths; legacy
  entries without trust default above the floor (back-compat).
- Bounded keyword-stuffing breadth heuristic: an entry that near-exactly
  matches more than N distinct queries is flagged and excluded, emitting
  memory:longterm:poisoning_detected.

Security trust/flag fields cannot be overridden via caller metadata.
Semantic provenance scoring remains the documented open problem.

https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
API keys lived in process.env readable by any in-process code including
plugins. Adds a sealed in-memory secrets provider: at finalizeStartup
(prod-gated via NODE_ENV=production or EOS_SEAL_SECRETS=1) credentials
are captured and deleted from process.env, then served only through the
gated getSecret()/requireSecret(). Non-sealed keys fall through to the
prior provider; dev/test are unaffected (no-op unless enabled).

LLM provider constructors, the Glasswally IOC secret, and the Discord
LLM fallbacks now resolve via getSecret() so they keep working after the
environment is sealed (HMAC secrets are already captured into module
consts at import, before finalization). External secrets manager / HSM
remains the documented ideal.

https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
README Known Limitations now reflects reality: formal threat model
exists (docs/STRIDE.md, resolved); single-process honestly described
(HIGH-tier already worker-thread isolated, residual gap noted);
Glasswally bullet notes the D-6 line-buffer DoS is now actually
enforced. The mesh/credentials/probes/memory bullets were updated
alongside their fixes in prior commits.

Also isolates AUDIT_LOG_PATH / AGENT_REVOCATION_LOG /
DECISION_LEDGER_PATH / MODEL_GUARD_DIR per jest worker via a setupFiles
script. Parallel workers previously appended to the same on-disk audit
log, intermittently corrupting the hash chain that the e2e suite
verifies. Per-worker temp paths remove the race without serializing
tests. No production code change.

https://claude.ai/code/session_01ArAvRMiZgCwF5oNj3r94Ap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants