aiproof

ESLint for AI prompts. A static analyzer for the prompts you feed to LLMs — finds real bugs before a single token is spent.

Who this is for · What aiproof is not · Install · Quick start · Rules · Configuration · Python API · FAQ · Comparison

Why this exists

Every AI developer has the same daily bug:

A prompt contradicts itself.
A hardcoded API key leaks into a system message.
A prompt interpolates user input without delimiters and gets jailbroken.
A prompt requests JSON but never shows a schema, so the model returns prose.
A system prompt places variable content before a 1024-token stable prefix and quietly defeats Anthropic prompt caching — doubling your bill.

Runtime tools (Promptfoo, Braintrust, Lakera) catch these after you make LLM calls — which costs money, adds latency, and assumes you already have evals set up. aiproof runs before a single token is spent:

Zero LLM calls. All checks are pure text and AST analysis.
Zero network. Works offline, air-gapped, in restricted environments.
Zero inference cost. Runs in milliseconds, not seconds.
Twenty rules covering clarity, security, efficiency, behavior, portability, and best-practice categories.

Who this is for

aiproof is for developers whose prompts live in git — committed files, version-controlled templates, or string arguments passed to LLM SDK calls. Concretely:

Engineers shipping LLM-backed products. You have client.messages.create(system=...), openai.chat.completions.create, PromptTemplate(...), or ChatPromptTemplate.from_messages([...]) in your repo. Run aiproof as a pre-commit hook so an accidentally hardcoded API key, a missing input boundary, or a contradictory instruction never reaches main.
Prompt-engineering teams maintaining a library. You've got prompts/triage.prompt.md, prompts/summarize.prompt.md, prompts/escalate.prompt.md — dozens of versioned templates. CI catches when a teammate's edit introduces a regression (a contradiction, a removed schema example, a tone clash).
Open-source AI library maintainers. You've got hundreds of example prompts in cookbooks, READMEs, and docs. aiproof finds credentials pasted in by accident (we found two real ones in the wild — see below) and catches model-portability issues (Claude-specific tags in a GPT-targeted example, etc.).
Cost-conscious teams using Anthropic prompt caching. You're spending real money on Claude Opus and just enabled caching. AIP009 catches every system prompt that places variable content in the first ~1024 tokens, defeating the cache and silently doubling your bill.
Security / platform engineers reviewing prompt PRs. aiproof --format sarif plugs into GitHub Code Scanning so credential leaks and prompt-injection vectors show up as PR comments — same workflow as any other security linter.

What aiproof is not

This is the most common misconception, so worth being direct about it:

Not a prompt rewriter. It does not make your prompt "better" or "more effective." It points at specific, well-defined classes of bugs (a hardcoded credential, a contradictory instruction, a missing delimiter). For a few rules it does mechanical fixes (redact the credential, wrap the user input in tags, dedupe a sentence). It will never restructure your prompt for clarity. For prompt optimization, use Anthropic's "Generate prompt" tool or iterate manually.
Not a runtime evaluator. It does not run your prompt against a model, score the output, or check accuracy. For that, use Promptfoo, Braintrust, or your own eval harness.
Not a runtime firewall. It does not detect attacks at request time. For that, use Lakera Guard or Rebuff.
Not for chat-window prompting. When you're typing into Claude Code, ChatGPT, or Cursor, your prompts are ephemeral — they don't live in a file. There's nothing for a static analyzer to check. By the time aiproof could flag a contradiction, you've already gotten an answer.

Rule of thumb: if your prompts live in a git commit, aiproof helps. If they live in a chat window, it doesn't.

Tested against real-world projects

aiproof runs cleanly against a corpus of 20 popular open-source AI projects pinned at exact SHAs (langchain, anthropic-cookbook, openai-cookbook, llama-index, autogen, crewAI, dspy, haystack, marvin, guidance, promptflow, instructor, mirascope, agno, llmware, semantic-kernel, prompty, AutoGPT, babyagi, chatgpt-api). The corpus runs in CI as a regression gate — see fixtures/corpus/CORPUS_REPORT.md for the per-repo diagnostic counts and FP analysis.

Honest scorecard for v0.1.4 against that corpus:

AIP006 (hardcoded credentials): 0 real findings. An earlier release produced 2 hits that turned out to be docstring placeholders (sk-ant-api03-xxxxxx... and "sk-randomAPIkey..."). v0.1.4 added placeholder-suppression to AIP006 and those are now correctly skipped. The fact that 20 popular AI repos ship zero live keys is a credit to those maintainers, not a marketing claim for aiproof.
AIP008 (jailbreak patterns): hits in test fixtures (intentionally embedded for adversarial-simulator testing) — auto-suppressed via fixture-path detection.
AIP010, AIP015 (markdown noise): suppressed by the is_prompt_shaped() gate so README/CHANGELOG markdown isn't linted as prompts.

The 20-repo corpus matters more as a false-positive regression gate than as a "look at all the bugs" demo: every change to a rule re-runs against these baselines, and any FP-rate increase fails CI.

What the output looks like

error[AIP006]: hardcoded anthropic credential in prompt text
  ┌─ docs/setup.md:160:26
  │
160 │ ANTHROPIC_API_KEY=sk-ant-api03-abcdefghijklmnopqrstuvwxyz1234
  │                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  │
  = See: https://github.com/Frostbyte-Devs/aiproof/blob/main/docs/rules/AIP006.md

warning[AIP001]: conflicting instruction: "explain your reasoning" contradicts "only output json" above
  ┌─ prompts/agent.prompt.md:2:19
  │
2 │ Only output JSON. Explain your reasoning before answering.
  │                   ^^^^^^^^^^^^^^^^^^^^^^
  │
  = See: https://github.com/Frostbyte-Devs/aiproof/blob/main/docs/rules/AIP001.md

warning[AIP009]: variable content within first ~1024 tokens defeats prompt caching
  ┌─ prompts/agent.prompt.md:4:1
  │
4 │ Answer this question: {query}
  │ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  │
  = See: https://github.com/Frostbyte-Devs/aiproof/blob/main/docs/rules/AIP009.md

Install

Python (recommended for most users)

pip install aiproof

Wheels are published for CPython 3.9+ on macOS (Intel + Apple Silicon), Linux (x86_64 + aarch64), and Windows x86_64. Single wheel works across all minor versions (abi3).

Rust binary

cargo install aiproof-cli

Prebuilt binaries

Download from GitHub Releases — static binaries for macOS, Linux, Windows.

From source

git clone https://github.com/Frostbyte-Devs/aiproof.git
cd aiproof
cargo install --path crates/aiproof-cli

Quick start

1. Lint a repo:

aiproof .

You'll see a list of findings in the terminal, with line numbers and squiggles. Exit code is 0 (clean), 1 (warnings), or 2 (errors).

2. Fix what's auto-fixable:

aiproof --fix .

Redacts hardcoded credentials, wraps user interpolations in input boundary tags, removes near-duplicate instructions, and deletes unused template variables. --fix is idempotent — run it twice, same result.

3. Learn more about a rule:

aiproof --explain AIP006

4. Set up CI:

aiproof --init

Prints a starter .aiproofrc and pre-commit hook snippet ready to paste.

5. Integrate with GitHub Code Scanning:

# .github/workflows/prompts.yml
on: { push: { branches: [main] }, pull_request: }
jobs:
  aiproof:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install aiproof
      - run: aiproof --format sarif . > aiproof.sarif
      - uses: github/codeql-action/upload-sarif@v3
        with: { sarif_file: aiproof.sarif }

Rules

aiproof ships 20 curated rules across 6 categories. Codes are stable; docs/rules/AIPxxx.md contains the full explanation for each.

Clarity — does the prompt say what you think it says?

Code	Name	Severity
AIP001	`conflicting-instructions`	warning
AIP002	`ambiguous-output-format`	warning
AIP003	`undefined-role`	info
AIP004	`contradictory-tone`	warning

Security — is the prompt safe?

Code	Name	Severity	Autofix
AIP005	`unescaped-user-input`	warning	—
AIP006	`hardcoded-credential`	error	✅ redact
AIP007	`missing-input-boundaries`	warning	✅ wrap
AIP008	`known-jailbreak-pattern` (26 signatures)	error	—

Efficiency — is the prompt cheap?

Code	Name	Severity	Autofix
AIP009	`cache-unfriendly-structure` (Anthropic prompt caching)	warning	—
AIP010	`redundant-instruction`	info	✅ remove
AIP011	`excessive-tokens`	warning	—
AIP012	`unused-template-variable`	info	✅ remove

Behavior — will the prompt do what you want?

Code	Name	Severity
AIP013	`missing-format-example`	info
AIP014	`undefined-tool-reference`	warning
AIP015	`unhandled-placeholder` (TODO/FIXME/XXX)	warning

Portability — will the prompt work on other models?

Code	Name	Severity
AIP016	`claude-specific-tags-on-gpt` (`<thinking>` on GPT targets)	warning
AIP017	`system-message-mismatch` (Anthropic shape on Gemini)	info
AIP018	`temperature-determinism-mismatch`	warning

Best practice — is the prompt well-structured?

Code	Name	Severity
AIP019	`missing-few-shot-for-reasoning`	info
AIP020	`system-message-overloaded` (>1500 tokens or >8 imperatives)	info

What `aiproof` reads

aiproof parses each file type with a format-aware parser and doesn't lint arbitrary Markdown — rules that care about prompt semantics gate on an "is this actually a prompt?" signal (explicit frontmatter role, SDK call-site extraction, or a .prompt.md / .prompt extension).

Input	Parser	Notes
`.md`, `.prompt.md`	`tree-sitter-md`	YAML frontmatter → `Role`
`.j2`, `.jinja`, `*.jinja2`	hand-rolled (`logos`)	variable table for AIP012
`*.mustache`	hand-rolled (`logos`)	variable table for AIP012
`.yaml`, `.yml`	Prompty-aware	`---`-fenced frontmatter → prompt body
`*.json`	MCP-aware	extracts `description` fields recursively
`*.py`	`tree-sitter-python`	extracts prompts from `messages.create`, `PromptTemplate`, `ChatPromptTemplate.from_messages`, `Agent(system=...)`
`.ts`, `.tsx`	`tree-sitter-typescript`	same call sites, template literals → `{0}`/`{1}` placeholders
`*.prompt`	raw

For Python and TypeScript, aiproof walks the AST and extracts the string arguments passed to known LLM SDK call sites — so it finds the real prompt your code ships, not a stringly guess.

Configuration

aiproof looks for configuration in this precedence order (first match wins):

CLI flags (--select, --ignore, --target-model)
.aiproofrc in the nearest ancestor directory (TOML)
[tool.aiproof] table in pyproject.toml
Built-in defaults

`.aiproofrc` example

# Which files to lint. If omitted, aiproof picks up `.prompt.md`,
# `.j2`/`.jinja`/`.jinja2`, `.mustache`, files under `prompts/`,
# `templates/`, `system_prompts/`, plus SDK extraction for `.py`/`.ts`.
include = ["prompts/**/*.md", "src/**/*.py"]

# Common FP sources — safe to exclude by default.
exclude = [
    "docs/plans/**",
    "releasenotes/**",
    "tests/cassettes/**",
    "tests/recordings/**",
    "tests/fixtures/**",
    "node_modules/**",
    "target/**",
    ".venv/**",
]

# Selectively disable rules (supports `AIP*` wildcard).
# ignore = ["AIP019"]

# Target models enable portability rules (AIP016, AIP017, AIP018).
target_models = ["claude-4.7-opus", "gpt-4"]

# Approximate token budget used by AIP011.
max_tokens_budget = 4000

All configuration keys

Key	Type	Default	What it does
`include`	`Vec<String>`	auto-discover	Glob patterns. If set, files must match one.
`exclude`	`Vec<String>`	`[]`	Glob patterns. Always applied.
`select`	`Vec<String>`	`[]` (all enabled)	Enable-list. Supports `AIP*`.
`ignore`	`Vec<String>`	`[]`	Disable-list. Supports `AIP*`.
`target_models`	`Vec<String>`	`[]`	Enables portability rules when set.
`max_tokens_budget`	`Option<usize>`	`4000`	Token ceiling for AIP011.
`fix`	`bool`	`false`	Apply safe autofixes.
`unsafe_fixes`	`bool`	`false`	Also apply rule-declared unsafe fixes.

Python API

import aiproof

# Lint a prompt string directly.
diagnostics = aiproof.check(
    source=open("prompts/agent.prompt.md").read(),
    path="prompts/agent.prompt.md",
    target_models=["claude-4.7-opus"],  # optional
    max_tokens_budget=4000,              # optional
)

for d in diagnostics:
    print(f"{d['severity'].upper()} {d['code']} "
          f"at line {d['start_line']}: {d['message']}")

# Exposes __version__ matching the installed wheel.
print(aiproof.__version__)  # "0.1.0"

Each diagnostic is a dict with these keys:

Key	Type
`code`	`str` — e.g. `"AIP006"`
`message`	`str` — human-readable summary
`severity`	`"info"` / `"warning"` / `"error"`
`category`	`"clarity"` / `"security"` / `"efficiency"` / `"behavior"` / `"portability"` / `"best-practice"`
`start_line`, `start_col`, `end_line`, `end_col`	`int` — 1-based positions
`file`	`str` — the path you passed

CLI reference

Static analyzer for AI prompts

Usage: aiproof [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...  Paths to scan. Defaults to current directory [default: .]

Options:
      --format <FORMAT>            Output format [pretty | json | sarif] [default: pretty]
      --select <CODE>              Enable specific rule codes (overrides config). Repeatable.
      --ignore <CODE>              Disable specific rule codes. Repeatable.
      --target-model <NAME>        Target model hint for portability rules. Repeatable.
      --fix                        Apply safe autofixes to files in place.
      --unsafe-fixes               Also apply autofixes the rule author marked unsafe.
      --color <COLOR>              [auto | always | never]
      --explain <CODE>             Print the bundled explanation for a rule code and exit 0.
      --init                       Print a starter config + pre-commit snippet and exit 0.
  -h, --help                       Print help
  -V, --version                    Print version

Exit codes

Code	Meaning
`0`	Clean — no findings, or only `info` severity
`1`	One or more `warning` severity findings
`2`	One or more `error` severity findings, or invalid arguments / config

Exit code is the max severity encountered in the run. Wire it into CI as a hard gate on errors and a non-blocking reporter on warnings.

FAQ

Does aiproof call an LLM? No. Ever. Every check is pure text + AST analysis. There is no network code in the binary; it works offline, in air-gapped environments, and on restricted CI runners.

How do I suppress a rule for a single file? Add a project-level ignore: ignore = ["AIP019"] in .aiproofrc, or use --ignore AIP019 on the command line. Per-line suppressions (# aiproof: ignore AIP019) are on the v0.2 roadmap.

How fast is it? Sub-50 ms per prompt file after warmup. A full scan of a 2800-file langchain checkout runs in under 5 seconds. Criterion benchmarks live under crates/aiproof-cli/benches/.

Does aiproof touch my files? Only when you pass --fix. Writes are atomic (tempfile + rename) so a crash mid-write leaves the original intact.

What if a rule has too many false positives for me? File an issue with the repro, or disable locally via ignore = ["AIPxxx"]. Every rule has a per-repo FP budget enforced against the 20-repo corpus in CI — we take regressions seriously.

Why Rust? Because ruff proved that a Rust-core linter with Python bindings wins on speed (10-100× faster than pure Python) and distribution (single static binary, no Python runtime required). Same playbook here.

Will there be a VS Code extension / GitHub Action? Yes, both are on the v0.2 roadmap. Inline diagnostics on file save is the obvious next step.

How is this different from...

Tool	What it does	Requires LLM calls?	Where it runs
aiproof	Static analysis of prompts	❌ No	Editor / pre-commit / CI
Promptfoo	Runtime evaluation of prompt + output	✅ Yes	CI / local eval runs
Braintrust	Runtime logging + evaluation	✅ Yes	Production / CI
Lakera Guard	Runtime prompt-injection firewall	✅ Yes	Production (API)
Rebuff	Runtime prompt-injection detection	✅ Yes	Production
Guardrails.ai	Runtime structured-output validation	✅ Yes	Production
PromptLayer	Runtime observability	✅ Yes	Production
Prompty	Prompt YAML format spec	—	—

aiproof is complementary to runtime tools — it catches issues before they ever hit an LLM. Combine with Lakera for runtime + aiproof for design-time coverage.

Design principles

No LLM calls. Ever. If you find one, file a bug.
Low false positives over high recall. A disabled linter is a dead linter. Every rule is validated against 20 real AI projects with a ≤5 % FP budget.
Beautiful output. Line numbers, squiggles, color, context lines, and a --explain URL per finding. ruff proved this matters.
Fast enough to run on save. Sub-50 ms per prompt file.
One-command install. pip install aiproof or cargo install aiproof-cli.
Deterministic output. Same input + same config = byte-for-byte identical output. Required for CI diffing.

Architecture

crates/
├── aiproof-core    — Document, Rule trait, Diagnostic, Severity, Span
├── aiproof-parse   — per-format parsers + SDK call-site extractor
├── aiproof-rules   — the 20 bundled rules
├── aiproof-config  — .aiproofrc + pyproject.toml loader
├── aiproof-report  — pretty (codespan-reporting) / JSON / SARIF renderers
├── aiproof-cli     — clap CLI + file discovery + orchestration
└── aiproof-py      — pyo3 + maturin Python wheel

Each rule is a single file implementing the Rule trait with a pure check(&Document, &Ctx) -> Vec<Diagnostic> function. Autofixes return an Option<Fix> with safe: bool.

Contributing

Contributions are welcome, especially new rules and FP reports from real-world prompts.

Dev setup

git clone https://github.com/Frostbyte-Devs/aiproof.git
cd aiproof

# Build
cargo build --workspace

# Run the full test suite (~170 tests)
cargo test --workspace

# Lint
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --all -- --check

# Run against a target directory
cargo run --release -p aiproof-cli -- --format pretty <path>

Adding a rule

Create crates/aiproof-rules/src/rules/aipXXX_your_rule.rs following the shape of an existing rule (e.g., aip006_hardcoded_credential.rs).
Register in crates/aiproof-rules/src/rules/mod.rs.
Add a test file at crates/aiproof-rules/tests/aipXXX_your_rule.rs with at least one positive and one negative case.
Write docs/rules/AIPXXX.md (under 100 words — what, why, example, fix).
Open a PR. CI will run the full corpus regression against all 20 repos.

Rules must meet a ≤5 % false-positive budget on the corpus to merge.

Corpus regression

./scripts/sync_corpus.sh        # shallow-clone 20 AI repos at pinned SHAs
./scripts/generate_baselines.sh # run aiproof against each, save JSON baselines

Baselines live in fixtures/corpus/*.baseline.json and are diffed in CI.

Roadmap

v0.2 — VS Code extension with inline diagnostics, GitHub Action (pre-made action YAML), per-line suppression comments, custom rule config DSL.
v0.3 — npm bindings (@frostbyte/aiproof), MCP server wrapping aiproof, additional SDK detection patterns (Cohere, Replicate, together.ai, Mistral), more parsers (Chezmoi templates, jsonnet).
v1.0 — embeddings-powered semantic rules, rule authoring SDK, hosted dashboard for team-wide FP dashboards.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github		.github
crates		crates
docs		docs
examples		examples
fixtures/corpus		fixtures/corpus
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
rust-toolchain.toml		rust-toolchain.toml

Folders and files

Latest commit

History

Repository files navigation

aiproof

Why this exists

Who this is for

What aiproof is not

Tested against real-world projects

What the output looks like

Install

Python (recommended for most users)

Rust binary

Prebuilt binaries

From source

Quick start

Rules

Clarity — does the prompt say what you think it says?

Security — is the prompt safe?

Efficiency — is the prompt cheap?

Behavior — will the prompt do what you want?

Portability — will the prompt work on other models?

Best practice — is the prompt well-structured?

What aiproof reads

Configuration

.aiproofrc example

All configuration keys

Python API

CLI reference

Exit codes

FAQ

How is this different from...

Design principles

Architecture

Contributing

Dev setup

Adding a rule

Corpus regression

Roadmap

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What `aiproof` reads

`.aiproofrc` example

Packages