Skip to content

feat(guardrails): prompt-injection hardening for untrusted intake#19

Merged
stxkxs merged 1 commit into
mainfrom
feat/prompt-injection-hardening
May 30, 2026
Merged

feat(guardrails): prompt-injection hardening for untrusted intake#19
stxkxs merged 1 commit into
mainfrom
feat/prompt-injection-hardening

Conversation

@stxkxs

@stxkxs stxkxs commented May 29, 2026

Copy link
Copy Markdown
Member

What

Ports two construction-time prompt-injection defenses from the factory's guardrail vocabulary (stxkxs/claudium) into fab's prompt assembly — defenses an inference-time content filter (e.g. Bedrock Guardrails) doesn't provide.

Changes

  • src/guardrails.ts (new) — normalizeDelimiters() strips Claude reserved tags (<system>, <tool_use>, …) from untrusted text; spotlight() fences it in a per-call random untrusted-<hex> delimiter the text can't forge.
  • src/workflows.ts — the untrusted intake brief seeding the workflow context (executeWorkflow's context, line 669) is now delimiter-normalized + spotlight-fenced, with a treat-as-data instruction that travels with the context so every role fences the brief as data, not instructions. Raw userPrompt is still used for intake-JSON parsing + branch pre-creation — only the seed context is wrapped, so nothing downstream breaks.
  • src/prompts.ts — repo URLs / mount paths / source-dir paths normalized before inlining into the system prompt.

Verification

npm run lint, npm test (262 pass, +5 guardrails tests), npm run build, npm run format:check — all green.

Ports two construction-time defenses from the factory's guardrail vocabulary
(stxkxs/claudium) into fab's prompt assembly — defenses an inference-time
content filter doesn't provide:

- src/guardrails.ts — normalizeDelimiters() strips Claude reserved tags
  (<system>, <tool_use>, ...) from untrusted text; spotlight() fences it in a
  per-call random untrusted-<hex> delimiter the text can't forge.
- src/workflows.ts — the untrusted intake brief that seeds the workflow
  context (executeWorkflow's `context`) is now delimiter-normalized +
  spotlight-fenced, with a treat-as-data instruction that travels with the
  context so every role reading it fences the brief as data, not
  instructions. Raw userPrompt is still used for intake-JSON parsing + branch
  pre-creation — only the seed context is wrapped.
- src/prompts.ts — repo URLs / mount paths / source-dir paths are
  normalized before inlining into the system prompt.

Verified: typecheck, vitest (+5 guardrails tests), build, prettier all green.
@stxkxs stxkxs force-pushed the feat/prompt-injection-hardening branch from 97ba55a to 924530f Compare May 30, 2026 04:04
@stxkxs stxkxs merged commit f4ef98b into main May 30, 2026
2 checks passed
@stxkxs stxkxs deleted the feat/prompt-injection-hardening branch May 30, 2026 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant