Pre-agent metadata scanner. Catches prompt injection in branch names, commit messages, PR titles, file names, and other untrusted strings before they reach an AI code reviewer.
Ward is a CLI and a GitHub Action. It screens the metadata an AI agent ingests before any LLM-based reviewer, SAST agent, or IaC scanner sees it. The job: catch prompt injection attempts embedded in the places that traditional security tools ignore.
In March 2026, AI bots compromised five major GitHub projects. The attack class was the same in each: agents being hijacked through inputs that traditional scanners treat as inert metadata. Branch names. File names. Commit messages. PR titles.
The existing security stack does not help here:
- SAST scanners ignore branch names and commit messages. Those have never been an attack surface before.
- Secret scanners look for credentials, not instructions.
- Prompt firewalls (Lakera, LlamaFirewall, BoltClaw) sit at the LLM boundary inside the agent. By the time they see the text, it is already in the context window.
- OWASP ASI Top 10 names the pattern (ASI01, goal hijack via untrusted input) but does not ship tooling.
Ward sits earlier. It runs against the surface area that attackers actually use, before any LLM has a chance to act on it.
| Tool | Layer | Catches |
|---|---|---|
| Ward | Before the agent reads input | Prompt injection in branch names, file names, commit messages, PR titles, PR descriptions, code comments, README files |
| Lakera Guard | LLM boundary | Prompt injection in the prompt itself, jailbreaks, off-topic queries |
| LlamaFirewall | LLM boundary | Prompt injection, alignment violations, output policy enforcement |
| BoltClaw | Agent configuration | Tampering with agent system prompts, tool allowlists, MCP configs |
| SAST / secret scanners | Source code | Vulnerabilities and credentials in the code itself |
Ward is one layer. It is not a replacement for the others. Defence in depth still applies.
Six detector categories, 25+ rules out of the box:
- Instruction overrides ("ignore previous instructions", "your new
task is...", fake
[SYSTEM]blocks). - Role manipulation (tokenizer tags like
<|im_start|>system, "developer mode", DAN-style activation). - Obfuscation (zero-width unicode, RTL override, base64 blobs in unusual fields, hex blobs, HTML comments).
- Tool-call injection (fake
<tool_call>wrappers, JSON tool-call objects,mcp://URIs, shell metacharacters in names). - Exfiltration prompts (instructions to POST findings to a URL, include secrets, encode data in DNS queries).
- AI tool-specific quirks (Anthropic Human / Assistant tags, Cursor command palette, Antigravity tool schemas, Copilot slash commands).
pipx install ward-scannerVerify the install:
ward versionexport GITHUB_TOKEN=ghp_...
ward scan-pr sonofg0tham/ward#42Reads the PR title, body, head branch name, commit messages, and changed file paths through the GitHub API. Never reads the file contents.
ward scan-localWalks the working tree, scans the current branch name, the last 20
commit messages, tag names, every tracked file's path, and the
top-of-file content of any .md, .txt, .rst, and source files.
echo "feat/ignore-previous-instructions" | ward scan-stdin --surface branch_nameEvery other Ward command is built on this one. Pipe whatever string you want through it.
ward scan-branch feat/ignore-previous-instructions
ward scan-commit HEAD
ward explain io.ignore_previousward scan-local --format pretty # default, terminal table
ward scan-local --format json # machine-readable
ward scan-local --format sarif # GitHub Code Scanning compatible# Drop anything below MEDIUM, only FAIL on CRITICAL.
ward scan-local --severity-threshold medium --fail-on criticalExit codes:
0PASS, no findings above the threshold.1WARN, findings exist but none reached the fail-on severity.2FAIL, at least one finding at or above fail-on.
Ward ships with a built-in lab that runs each scripted attack scenario through two pipelines (unprotected and Ward-protected) and produces a Markdown report you can paste into a blog post or PR comment:
ward lab attack
# Wrote lab report: ward-lab-report.md
# Blocked by Ward: 5/5 scenarios.The mock reviewer agent does not call an LLM. The lab demonstrates whether the untrusted instruction would have reached the agent's context window, not what the LLM would have done with it. Wiring in a real reviewer is the next step.
Flags: --output <path>, --no-write (print to stdout),
--fail-on <severity>.
If you use the pre-commit framework, drop
this into your .pre-commit-config.yaml:
- repo: https://github.com/sonofg0tham/ward
rev: v0.1.0
hooks:
- id: ward-scan-local
args: [--fail-on, high]Ward then runs on every git commit and git push, screening your
branch name, commit messages, and tracked documentation files for
injection patterns. Stops you committing a poisoned PR before it ever
reaches GitHub.
Other hook ids: ward-scan-stdin (designed for the commit-msg
stage, screens the message you're typing), ward-selftest (manual,
useful as a CI gate).
Add it to a workflow in three lines:
- uses: sonofg0tham/ward/action@v1
with:
fail-on: highA fuller example that uploads SARIF to the GitHub Security tab:
name: Ward
on: [pull_request]
permissions:
contents: read
security-events: write
jobs:
ward:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: sonofg0tham/ward/action@v1
with:
fail-on: high
format: sarif
upload-sarif: trueIf you are building an agentic system (CrewAI, AutoGen, LangGraph, your own loop) and want to screen text before it reaches the model, import Ward directly:
from ward import build_input, scan_inputs, load_rule_pack, Verdict
# Load the bundled rule pack once at startup.
pack = load_rule_pack()
def safe_ingest(untrusted_text: str) -> str:
inputs = [build_input("pr_body", untrusted_text, location="user-input")]
report = scan_inputs(inputs, pack, target="my-agent")
if report.verdict is not Verdict.PASS:
flagged = [f.rule_id for f in report.findings]
raise ValueError(f"Refusing to ingest untrusted text: {flagged}")
return untrusted_textThe 13 supported surface types (branch_name, commit_message, pr_body,
file_content, ...) let you tune which rules apply. A LangGraph tool that
ingests web search results would use pr_body or file_content; a CrewAI
agent reading a filename would use file_name.
from ward import build_input, scan_inputs, load_rule_pack, Verdict
_pack = load_rule_pack()
def web_search_node(state):
text = state["search_result"]
report = scan_inputs(
[build_input("file_content", text, location="search")],
_pack,
target="search_result",
)
if report.verdict is not Verdict.PASS:
state["search_result"] = "(blocked by Ward)"
state["ward_findings"] = [f.rule_id for f in report.findings]
return statefrom crewai.tools import BaseTool
from ward import build_input, scan_inputs, load_rule_pack, Verdict
class GuardedFileReader(BaseTool):
name = "read_file"
description = "Read a file, screened by Ward."
_pack = load_rule_pack()
def _run(self, path: str) -> str:
text = open(path).read()
report = scan_inputs(
[build_input("file_content", text, location=path)],
self._pack,
target=path,
)
if report.verdict is not Verdict.PASS:
return f"(refused: Ward flagged {[f.rule_id for f in report.findings]})"
return textDrop a directory of YAML files alongside your repo and point Ward at it:
ward scan-local --rule-pack ./security/ward-rulesEach YAML file is a list of rules. Schema is documented in
src/ward/rules/instruction_overrides.yaml.
Some directories - test fixtures, security research notes, rule packs
themselves - are intentionally adversarial and should not be scanned for
content. Drop a .wardignore at the repo root with fnmatch-style globs:
# .wardignore
tests/fixtures/**/* # adversarial by design
security/research/* # writeup of past attacks
docs/threat-models/*
Filenames in ignored paths are STILL scanned (a malicious filename remains suspicious even inside an ignored directory). Only the content scan is suppressed. Ward's own repo uses this to exclude its own source tree from self-scanning.
Security-research docs (Ward's own README included) need to talk about the attack strings without firing the scanner. Drop this directive near the top of any documentation file:
<!-- ward-allow-file: io.*, role.tokenizer_tag -->The directive accepts rule ids or fnmatch-style globs, comma-separated.
It is only honoured on file_content and code_comment surfaces, never
on branch names, commit messages, PR titles, or PR bodies. That's the
intentional asymmetry: attackers cannot suppress detection from inside
the text Ward is trying to screen.
Supported comment styles for the directive:
<!-- ward-allow-file: io.* --> <!-- HTML / Markdown -->
# ward-allow-file: io.* # Python / Bash / YAML
// ward-allow-file: io.* // JS / TS / Go / Rust / Java
/* ward-allow-file: io.* */ /* C / CSS */Ward feeds detectors a normalised view of the text plus several alternative forms designed to defeat common evasion tricks:
- Leetspeak —
1gn0r3 4ll pr3v10usbecomesignore all previous. - Intra-word separators —
i.g.n.o.r.eandi-g-n-o-r-ecollapse toignore. - Repeated letters —
ignoooooreandprevioussssscollapse toignoreandprevious. Two collapse variants are tried (collapse to 1 letter and collapse to 2) so naturally-doubled English words likeall,free,seesurvive. - Zero-width unicode — stripped before regex match.
- NFKC — fullwidth and compatibility characters fold to ASCII.
- Base64 / hex blocks — decoded and re-scanned.
- Identifier delimiters —
-,_,/,.in branch and file names normalise to spaces.
Known limitation: the all-single-space case (i g n o r e p r e v i o u s) is not handled, because the original word boundaries cannot be
recovered reliably from spaced-out singletons. Multi-space separators
between words (i g n o r e p r e v i o u s) are still ambiguous and
out of scope for v0.1.
Ward is a pattern-matching tool. It catches the attack class documented in OWASP ASI Top 10 (ASI01) and in the March 2026 GitHub supply-chain incidents.
It does not catch:
- Novel zero-day injection techniques that match no rule.
- Attacks embedded in non-text formats (images, PDFs, audio).
- Attacks on the model itself once context has been built. That is a prompt firewall's job.
- Vulnerabilities in the code being reviewed. That is SAST's job.
See SECURITY.md for the full threat model and the vulnerability disclosure process.
Ward sends none. No phone home, no anonymous stats, no metrics
collection. The only outbound network calls Ward ever makes are the
GitHub API requests you explicitly trigger via ward scan-pr.
git clone https://github.com/sonofg0tham/ward
cd ward
python -m venv .venv && source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -e ".[dev]"
pytestCoverage target is 75% and current trunk runs at 83%.
MIT. See LICENSE.