feat(guardrails): prompt-injection hardening for untrusted intake by stxkxs · Pull Request #19 · nanohype/fab

stxkxs · 2026-05-29T18:39:23Z

What

Ports two construction-time prompt-injection defenses from the factory's guardrail vocabulary (stxkxs/claudium) into fab's prompt assembly — defenses an inference-time content filter (e.g. Bedrock Guardrails) doesn't provide.

Changes

src/guardrails.ts (new) — normalizeDelimiters() strips Claude reserved tags (<system>, <tool_use>, …) from untrusted text; spotlight() fences it in a per-call random untrusted-<hex> delimiter the text can't forge.
src/workflows.ts — the untrusted intake brief seeding the workflow context (executeWorkflow's context, line 669) is now delimiter-normalized + spotlight-fenced, with a treat-as-data instruction that travels with the context so every role fences the brief as data, not instructions. Raw userPrompt is still used for intake-JSON parsing + branch pre-creation — only the seed context is wrapped, so nothing downstream breaks.
src/prompts.ts — repo URLs / mount paths / source-dir paths normalized before inlining into the system prompt.

Verification

npm run lint, npm test (262 pass, +5 guardrails tests), npm run build, npm run format:check — all green.

Ports two construction-time defenses from the factory's guardrail vocabulary (stxkxs/claudium) into fab's prompt assembly — defenses an inference-time content filter doesn't provide: - src/guardrails.ts — normalizeDelimiters() strips Claude reserved tags (<system>, <tool_use>, ...) from untrusted text; spotlight() fences it in a per-call random untrusted-<hex> delimiter the text can't forge. - src/workflows.ts — the untrusted intake brief that seeds the workflow context (executeWorkflow's `context`) is now delimiter-normalized + spotlight-fenced, with a treat-as-data instruction that travels with the context so every role reading it fences the brief as data, not instructions. Raw userPrompt is still used for intake-JSON parsing + branch pre-creation — only the seed context is wrapped. - src/prompts.ts — repo URLs / mount paths / source-dir paths are normalized before inlining into the system prompt. Verified: typecheck, vitest (+5 guardrails tests), build, prettier all green.

stxkxs force-pushed the feat/prompt-injection-hardening branch from 97ba55a to 924530f Compare May 30, 2026 04:04

stxkxs merged commit f4ef98b into main May 30, 2026
2 checks passed

stxkxs deleted the feat/prompt-injection-hardening branch May 30, 2026 04:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(guardrails): prompt-injection hardening for untrusted intake#19

feat(guardrails): prompt-injection hardening for untrusted intake#19
stxkxs merged 1 commit into
mainfrom
feat/prompt-injection-hardening

stxkxs commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stxkxs commented May 29, 2026

What

Changes

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant