security(agent-sdk): wire PreToolUse hook to runtime-block destructive Bash (force push, gh pr merge, reset --hard) the prompt only forbids

## Finding

The agent subprocess runs with `permissionMode: "bypassPermissions"` + `allowDangerouslySkipPermissions: true` and `Bash` in `allowedTools` (`src/core/executor.ts:259-261`). Destructive-action prohibitions — "NEVER force push", "NEVER call `gh pr merge`", "NEVER push to the base branch" — are encoded **only as system-prompt directives**: `src/core/prompt-builder.ts:193`, `src/core/prompt-builder.ts:392`, `src/core/prompt-builder.ts:594`, `src/core/prompt-builder.ts:755`, plus per-handler reinforcements at `src/workflows/handlers/resolve.ts:392-393` and `src/workflows/handlers/review.ts:326`. There is no runtime gate that physically blocks the agent from executing `git push --force`, `git reset --hard`, `gh pr merge`, `git branch -D`, or the GraphQL `mergePullRequest` / `mergeBranch` mutations once the LLM decides to call Bash.

The repo's `scripts/check-no-destructive-actions.ts` (FR-009 static guard, wired via `bun run check:no-destructive`) carries the canonical forbidden-pattern set (lines 50-65) and CI-fails when *our source code* writes such commands inside `src/workflows/ship/` or `src/daemon/scoped-*-executor.ts`. That guard protects us from shipping new bypass paths — but it cannot stop the runtime LLM from typing one of those exact commands into a `Bash` tool call after a successful prompt injection from cloned-PR content, an attacker-controlled comment body, or an MCP tool result.

The Claude Agent SDK has shipped a first-class fix for this since v0.3.0: the `hooks` option, with a `PreToolUse` matcher, lets the executor intercept every `Bash` tool call, scan `tool_input.command`, and return `permissionDecision: "deny"` with a structured reason that flows back into the model as a tool-result error (see the SDK docs and the example in the project's own context7 query above). `hooks` is the only `queryOptions` field documented for runtime tool-call gating, and a `grep` of `src/` confirms zero current callers (`HookCallback`, `PreToolUseHookInput`, `hookSpecificOutput` are unimported). Wiring a single `PreToolUse: [{ matcher: "Bash", hooks: [...] }]` into the `queryOptions` constructed at `src/core/executor.ts:257-330` — backed by reusing the existing `FORBIDDEN` array from `scripts/check-no-destructive-actions.ts` (lift it into `src/utils/forbidden-bash.ts` so the static guard and the runtime hook share one source of truth) — closes the gap with no architectural change and no new dependency.

## Diagram

```mermaid
flowchart TB
 subgraph Now["Current: prompt-only barrier"]
 direction TB
 P1[Prompt: NEVER force push NEVER gh pr merge]:::warn
 A1[Claude Agent SDK bypassPermissions + Bash allowed]:::neutral
 T1{Attacker-influenced context overrides directive?}:::warn
 B1[Bash: git push --force]:::danger
 G1[GitHub remote ref rewritten]:::danger
 OK1[Safe path]:::ok
 P1 --> A1 --> T1
 T1 -- yes --> B1 --> G1
 T1 -- no --> OK1
 end

 subgraph Hook["Proposed: PreToolUse hook gate"]
 direction TB
 P2[Same prompt]:::warn
 A2[SDK with hooks.PreToolUse matcher: Bash]:::neutral
 B2[Bash: git push --force]:::danger
 H2{Hook scans tool_input.command against FORBIDDEN regex set}:::ok
 D2[permissionDecision: deny reason posted to model]:::ok
 OK2[Tool call blocked agent recovers or fails closed]:::ok
 ALLOW2[allow]:::ok
 P2 --> A2 --> B2 --> H2
 H2 -- match --> D2 --> OK2
 H2 -- no match --> ALLOW2
 end

 classDef neutral fill:#1f4e79,color:#ffffff,stroke:#0d2a44,stroke-width:1px
 classDef warn fill:#9a6700,color:#ffffff,stroke:#5c3d00,stroke-width:1px
 classDef danger fill:#a40e26,color:#ffffff,stroke:#5c0815,stroke-width:1px
 classDef ok fill:#196f3d,color:#ffffff,stroke:#0d3a20,stroke-width:1px
```

## Rationale

The CLAUDE.md "Security invariants" section already enumerates four prompt-injection-hardening contracts (subprocess env allowlist, output secret-strip, input sanitization, structured-output parsing). A fifth contract — *runtime tool-call gating for destructive Bash invocations* — is the natural next layer, and it is the only one of the five that has a vendor-supported SDK API rather than a custom chokepoint we maintain ourselves. Crucially, defense-in-depth is the explicit pattern this codebase already chose for idempotency (Valkey claim + `idx_workflow_runs_inflight` partial-unique index, see CLAUDE.md "Idempotency" section) and for output secret-stripping (regex pass + LLM scanner + deletion-only gate, `src/utils/github-output-guard.ts`). The current destructive-action story is the odd one out: a single prose-level barrier with no runtime backstop.

The blast radius of a single successful prompt-injected `git push --force-with-lease` against a PR head is meaningful — collaborators' in-flight commits get overwritten, the PR review history detaches, and `force-with-lease` itself is one of the `FORBIDDEN` patterns the static guard explicitly forbids us from emitting (`scripts/check-no-destructive-actions.ts:51`). A `gh pr merge` injection violates FR-017 ("merging is a human action") and bypasses any branch-protection check that allows the GitHub App identity. Both are credible after the recent `.mcp.json` auto-load bypass (`#196`) and `.claude/settings.json` SessionStart bypass (`#191`) showed the cloned PR tree is an active prompt-injection surface against this exact subprocess; both were closed with SDK-option fixes (`strictMcpConfig: true`, `settingSources: []`) inside the same `queryOptions` object that this proposal extends.

Adoption cost is low: one new field on the `queryOptions` literal, one new file `src/utils/forbidden-bash.ts` exporting the shared `FORBIDDEN` regex array, a one-line refactor in `scripts/check-no-destructive-actions.ts` to import from it, and a focused unit test exercising representative deny inputs (force push variants, `gh pr merge`, `git reset --hard`, the GraphQL mutation names already in scope). The hook is per-`query()` invocation, so it composes with the existing `useCacheableLayout` pivot and the per-call `model` override without rework.

## References

**Internal**:
- `src/core/executor.ts:257-330` — `queryOptions` literal where `hooks: { PreToolUse: [...] }` would be added (between `mcpServers` at 278 and `systemPrompt` at 291).
- `src/core/executor.ts:259-268` — current `permissionMode: "bypassPermissions"`, `allowDangerouslySkipPermissions: true`, `allowedTools`, `disallowedTools: ["ToolSearch"]` surface that the hook complements.
- `scripts/check-no-destructive-actions.ts:50-65` — canonical `FORBIDDEN` regex array (force-push variants, `git reset --hard`, `git branch -D`, `git filter-branch`, `git filter-repo`, `git replace`, `gh pr merge`, `mergePullRequest`, `mergeBranch`) to share with the runtime hook.
- `src/core/prompt-builder.ts:193`, `src/core/prompt-builder.ts:392`, `src/core/prompt-builder.ts:594`, `src/core/prompt-builder.ts:755` — current prompt-only "NEVER force push" directives that the hook would back with runtime enforcement.
- `src/workflows/handlers/resolve.ts:392-393`, `src/workflows/handlers/review.ts:326` — per-handler prompt-only reinforcements ("NEVER call `gh pr merge`", "NEVER push to the base branch").
- `CLAUDE.md` "Security invariants (prompt-injection hardening)" section — the four-contract list this finding extends with a fifth, runtime tool-call gating.
- `CLAUDE.md` "CI/CD Pipeline" entry for `check:no-destructive` — describes the existing static-guard layer this finding makes runtime.

**External**:
- [Claude Agent SDK — Intercept and control agent behavior with hooks](https://code.claude.com/docs/en/agent-sdk/hooks) — official SDK docs for `PreToolUse` matchers, deny-decision return shape, and the security rationale ("dangerous commands are physically blocked at the execution layer; the agent cannot bypass, forget, or reason its way around it").
- [Claude Agent SDK TypeScript — Query Function Options](https://github.com/nothflare/claude-agent-sdk-docs/blob/main/docs/en/agent-sdk/typescript.md) — `hooks` option signature `Partial<Record<HookEvent, HookCallbackMatcher[]>>`, available since the v0.3.x line pinned in `package.json` (`@anthropic-ai/claude-agent-sdk: ^0.3.0`).
- [Claude Code Hooks: 6 Production Patterns (2026)](https://www.pixelmojo.io/blogs/claude-code-hooks-production-quality-ci-cd-patterns) — production pattern catalog for `PreToolUse` Bash-command pattern matching including `rm -rf /`, `DROP TABLE`, `curl | bash`, the exact class of guard proposed here.

## Suggested Next Steps

1. **Lift `FORBIDDEN` into `src/utils/forbidden-bash.ts`** as an exported `readonly` array of `{ pattern: RegExp; description: string }` so the static guard and the runtime hook consume one source of truth. Update `scripts/check-no-destructive-actions.ts:50` to import it.
2. **Add `forbiddenBashHook: HookCallback` in a new `src/core/hooks/forbidden-bash.ts`** that casts `input` to `PreToolUseHookInput`, extracts `tool_input.command` (string), tests each `FORBIDDEN` pattern, and returns `{ hookSpecificOutput: { hookEventName: "PreToolUse", permissionDecision: "deny", permissionDecisionReason: "Blocked by forbidden-bash hook: ${description}" } }` on first match. Default-allow with `{}`.
3. **Wire it into `queryOptions` at `src/core/executor.ts:278`** as `hooks: { PreToolUse: [{ matcher: "Bash", hooks: [forbiddenBashHook] }] }`. Log a structured `agent.hook.denied` event (delivery-correlated child logger) on every deny so the operator sees runtime gating activity, mirroring the `llm_scanner_substitution_rejected` log convention from `src/utils/github-output-guard.ts`.
4. **Add a focused unit test** at `src/core/hooks/forbidden-bash.test.ts` covering: `git push --force`, `git push -f`, `git push --force-with-lease`, `git reset --hard HEAD~1`, `gh pr merge 123 --squash`, plus three benign commands (`git push origin HEAD`, `git status`, `gh pr checks 5`) that MUST allow. Use the existing colocated-test pattern (issue #201) so CI picks it up.
5. **Extend CLAUDE.md "Security invariants"** with a fifth contract — *Runtime destructive-Bash gate (`src/core/hooks/forbidden-bash.ts`)* — and cross-link from the `check:no-destructive` bullet in "CI/CD Pipeline" so future contributors see both layers as paired.
6. **Update `docs/build/architecture.md` and `docs/operate/configuration.md`** alongside the code change (per the CLAUDE.md "Documentation" doc-sync rule) so the doc-citation CI gate stays green.

## Areas Evaluated

- Read `src/core/executor.ts` in full to confirm current `queryOptions` shape, `permissionMode`, `allowedTools`/`disallowedTools` surface, and absence of any `hooks:` field on the literal.
- Read `src/core/prompt-builder.ts:185-200` to confirm the prompt-only `NEVER force push` directive at line 193 is part of the standard `commitInstructions` block emitted for every PR job.
- Read `src/workflows/handlers/resolve.ts:385-399` to confirm the `gh pr merge` / base-branch push prohibitions at lines 392-393 are also prompt-only.
- Read `scripts/check-no-destructive-actions.ts:1-80` to confirm the static guard is source-code-only (greps our `.ts` files) and lifts the canonical `FORBIDDEN` regex array.
- Grepped `src/` for `HookCallback|PreToolUse|hookSpecificOutput` and `hooks:` — zero matches, confirming the SDK hooks API is unused.
- Queried context7 (`/nothflare/claude-agent-sdk-docs`) for the current TS `query()` options surface, confirmed `hooks` is documented and matches the v0.3.x line pinned in `package.json`.
- Cross-checked existing closed agent-sdk issue (`settingSources []` / `#191`) to ensure no overlap; this finding addresses a distinct, complementary attack surface (Bash tool call) using the same `queryOptions` extension pattern.
- WebSearch'd current Claude Agent SDK hook production patterns (2026) to confirm the `PreToolUse` + matcher: "Bash" + deny-on-pattern approach is the recommended SDK-native idiom for exactly this class of guard.

*Generated by the scheduled research action on 2026-06-08*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security(agent-sdk): wire PreToolUse hook to runtime-block destructive Bash (force push, gh pr merge, reset --hard) the prompt only forbids #222

Finding

Diagram

Rationale

References

Suggested Next Steps

Areas Evaluated

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

security(agent-sdk): wire PreToolUse hook to runtime-block destructive Bash (force push, gh pr merge, reset --hard) the prompt only forbids #222

Description

Finding

Diagram

Rationale

References

Suggested Next Steps

Areas Evaluated

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions