Skip to content

security(agent-sdk): wire PreToolUse hook to runtime-block destructive Bash (force push, gh pr merge, reset --hard) the prompt only forbids #222

@chrisleekr

Description

@chrisleekr

Finding

The agent subprocess runs with permissionMode: "bypassPermissions" + allowDangerouslySkipPermissions: true and Bash in allowedTools (src/core/executor.ts:259-261). Destructive-action prohibitions — "NEVER force push", "NEVER call gh pr merge", "NEVER push to the base branch" — are encoded only as system-prompt directives: src/core/prompt-builder.ts:193, src/core/prompt-builder.ts:392, src/core/prompt-builder.ts:594, src/core/prompt-builder.ts:755, plus per-handler reinforcements at src/workflows/handlers/resolve.ts:392-393 and src/workflows/handlers/review.ts:326. There is no runtime gate that physically blocks the agent from executing git push --force, git reset --hard, gh pr merge, git branch -D, or the GraphQL mergePullRequest / mergeBranch mutations once the LLM decides to call Bash.

The repo's scripts/check-no-destructive-actions.ts (FR-009 static guard, wired via bun run check:no-destructive) carries the canonical forbidden-pattern set (lines 50-65) and CI-fails when our source code writes such commands inside src/workflows/ship/ or src/daemon/scoped-*-executor.ts. That guard protects us from shipping new bypass paths — but it cannot stop the runtime LLM from typing one of those exact commands into a Bash tool call after a successful prompt injection from cloned-PR content, an attacker-controlled comment body, or an MCP tool result.

The Claude Agent SDK has shipped a first-class fix for this since v0.3.0: the hooks option, with a PreToolUse matcher, lets the executor intercept every Bash tool call, scan tool_input.command, and return permissionDecision: "deny" with a structured reason that flows back into the model as a tool-result error (see the SDK docs and the example in the project's own context7 query above). hooks is the only queryOptions field documented for runtime tool-call gating, and a grep of src/ confirms zero current callers (HookCallback, PreToolUseHookInput, hookSpecificOutput are unimported). Wiring a single PreToolUse: [{ matcher: "Bash", hooks: [...] }] into the queryOptions constructed at src/core/executor.ts:257-330 — backed by reusing the existing FORBIDDEN array from scripts/check-no-destructive-actions.ts (lift it into src/utils/forbidden-bash.ts so the static guard and the runtime hook share one source of truth) — closes the gap with no architectural change and no new dependency.

Diagram

flowchart TB
    subgraph Now["Current: prompt-only barrier"]
        direction TB
        P1[Prompt: NEVER force push<br/>NEVER gh pr merge]:::warn
        A1[Claude Agent SDK<br/>bypassPermissions + Bash allowed]:::neutral
        T1{Attacker-influenced<br/>context overrides directive?}:::warn
        B1[Bash: git push --force]:::danger
        G1[GitHub remote ref rewritten]:::danger
        OK1[Safe path]:::ok
        P1 --> A1 --> T1
        T1 -- yes --> B1 --> G1
        T1 -- no --> OK1
    end

    subgraph Hook["Proposed: PreToolUse hook gate"]
        direction TB
        P2[Same prompt]:::warn
        A2[SDK with hooks.PreToolUse<br/>matcher: Bash]:::neutral
        B2[Bash: git push --force]:::danger
        H2{Hook scans tool_input.command<br/>against FORBIDDEN regex set}:::ok
        D2[permissionDecision: deny<br/>reason posted to model]:::ok
        OK2[Tool call blocked<br/>agent recovers or fails closed]:::ok
        ALLOW2[allow]:::ok
        P2 --> A2 --> B2 --> H2
        H2 -- match --> D2 --> OK2
        H2 -- no match --> ALLOW2
    end

    classDef neutral fill:#1f4e79,color:#ffffff,stroke:#0d2a44,stroke-width:1px
    classDef warn fill:#9a6700,color:#ffffff,stroke:#5c3d00,stroke-width:1px
    classDef danger fill:#a40e26,color:#ffffff,stroke:#5c0815,stroke-width:1px
    classDef ok fill:#196f3d,color:#ffffff,stroke:#0d3a20,stroke-width:1px
Loading

Rationale

The CLAUDE.md "Security invariants" section already enumerates four prompt-injection-hardening contracts (subprocess env allowlist, output secret-strip, input sanitization, structured-output parsing). A fifth contract — runtime tool-call gating for destructive Bash invocations — is the natural next layer, and it is the only one of the five that has a vendor-supported SDK API rather than a custom chokepoint we maintain ourselves. Crucially, defense-in-depth is the explicit pattern this codebase already chose for idempotency (Valkey claim + idx_workflow_runs_inflight partial-unique index, see CLAUDE.md "Idempotency" section) and for output secret-stripping (regex pass + LLM scanner + deletion-only gate, src/utils/github-output-guard.ts). The current destructive-action story is the odd one out: a single prose-level barrier with no runtime backstop.

The blast radius of a single successful prompt-injected git push --force-with-lease against a PR head is meaningful — collaborators' in-flight commits get overwritten, the PR review history detaches, and force-with-lease itself is one of the FORBIDDEN patterns the static guard explicitly forbids us from emitting (scripts/check-no-destructive-actions.ts:51). A gh pr merge injection violates FR-017 ("merging is a human action") and bypasses any branch-protection check that allows the GitHub App identity. Both are credible after the recent .mcp.json auto-load bypass (#196) and .claude/settings.json SessionStart bypass (#191) showed the cloned PR tree is an active prompt-injection surface against this exact subprocess; both were closed with SDK-option fixes (strictMcpConfig: true, settingSources: []) inside the same queryOptions object that this proposal extends.

Adoption cost is low: one new field on the queryOptions literal, one new file src/utils/forbidden-bash.ts exporting the shared FORBIDDEN regex array, a one-line refactor in scripts/check-no-destructive-actions.ts to import from it, and a focused unit test exercising representative deny inputs (force push variants, gh pr merge, git reset --hard, the GraphQL mutation names already in scope). The hook is per-query() invocation, so it composes with the existing useCacheableLayout pivot and the per-call model override without rework.

References

Internal:

  • src/core/executor.ts:257-330queryOptions literal where hooks: { PreToolUse: [...] } would be added (between mcpServers at 278 and systemPrompt at 291).
  • src/core/executor.ts:259-268 — current permissionMode: "bypassPermissions", allowDangerouslySkipPermissions: true, allowedTools, disallowedTools: ["ToolSearch"] surface that the hook complements.
  • scripts/check-no-destructive-actions.ts:50-65 — canonical FORBIDDEN regex array (force-push variants, git reset --hard, git branch -D, git filter-branch, git filter-repo, git replace, gh pr merge, mergePullRequest, mergeBranch) to share with the runtime hook.
  • src/core/prompt-builder.ts:193, src/core/prompt-builder.ts:392, src/core/prompt-builder.ts:594, src/core/prompt-builder.ts:755 — current prompt-only "NEVER force push" directives that the hook would back with runtime enforcement.
  • src/workflows/handlers/resolve.ts:392-393, src/workflows/handlers/review.ts:326 — per-handler prompt-only reinforcements ("NEVER call gh pr merge", "NEVER push to the base branch").
  • CLAUDE.md "Security invariants (prompt-injection hardening)" section — the four-contract list this finding extends with a fifth, runtime tool-call gating.
  • CLAUDE.md "CI/CD Pipeline" entry for check:no-destructive — describes the existing static-guard layer this finding makes runtime.

External:

Suggested Next Steps

  1. Lift FORBIDDEN into src/utils/forbidden-bash.ts as an exported readonly array of { pattern: RegExp; description: string } so the static guard and the runtime hook consume one source of truth. Update scripts/check-no-destructive-actions.ts:50 to import it.
  2. Add forbiddenBashHook: HookCallback in a new src/core/hooks/forbidden-bash.ts that casts input to PreToolUseHookInput, extracts tool_input.command (string), tests each FORBIDDEN pattern, and returns { hookSpecificOutput: { hookEventName: "PreToolUse", permissionDecision: "deny", permissionDecisionReason: "Blocked by forbidden-bash hook: ${description}" } } on first match. Default-allow with {}.
  3. Wire it into queryOptions at src/core/executor.ts:278 as hooks: { PreToolUse: [{ matcher: "Bash", hooks: [forbiddenBashHook] }] }. Log a structured agent.hook.denied event (delivery-correlated child logger) on every deny so the operator sees runtime gating activity, mirroring the llm_scanner_substitution_rejected log convention from src/utils/github-output-guard.ts.
  4. Add a focused unit test at src/core/hooks/forbidden-bash.test.ts covering: git push --force, git push -f, git push --force-with-lease, git reset --hard HEAD~1, gh pr merge 123 --squash, plus three benign commands (git push origin HEAD, git status, gh pr checks 5) that MUST allow. Use the existing colocated-test pattern (issue fix(testing): scripts/test-isolated.sh glob silently excludes 4 src/**/*.test.ts files from CI #201) so CI picks it up.
  5. Extend CLAUDE.md "Security invariants" with a fifth contract — Runtime destructive-Bash gate (src/core/hooks/forbidden-bash.ts) — and cross-link from the check:no-destructive bullet in "CI/CD Pipeline" so future contributors see both layers as paired.
  6. Update docs/build/architecture.md and docs/operate/configuration.md alongside the code change (per the CLAUDE.md "Documentation" doc-sync rule) so the doc-citation CI gate stays green.

Areas Evaluated

  • Read src/core/executor.ts in full to confirm current queryOptions shape, permissionMode, allowedTools/disallowedTools surface, and absence of any hooks: field on the literal.
  • Read src/core/prompt-builder.ts:185-200 to confirm the prompt-only NEVER force push directive at line 193 is part of the standard commitInstructions block emitted for every PR job.
  • Read src/workflows/handlers/resolve.ts:385-399 to confirm the gh pr merge / base-branch push prohibitions at lines 392-393 are also prompt-only.
  • Read scripts/check-no-destructive-actions.ts:1-80 to confirm the static guard is source-code-only (greps our .ts files) and lifts the canonical FORBIDDEN regex array.
  • Grepped src/ for HookCallback|PreToolUse|hookSpecificOutput and hooks: — zero matches, confirming the SDK hooks API is unused.
  • Queried context7 (/nothflare/claude-agent-sdk-docs) for the current TS query() options surface, confirmed hooks is documented and matches the v0.3.x line pinned in package.json.
  • Cross-checked existing closed agent-sdk issue (settingSources [] / #191) to ensure no overlap; this finding addresses a distinct, complementary attack surface (Bash tool call) using the same queryOptions extension pattern.
  • WebSearch'd current Claude Agent SDK hook production patterns (2026) to confirm the PreToolUse + matcher: "Bash" + deny-on-pattern approach is the recommended SDK-native idiom for exactly this class of guard.

Generated by the scheduled research action on 2026-06-08

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions