agent-guardrails is not trying to be the fastest way to generate code from a blank prompt.
It is trying to make AI-written changes easier to trust when the code already lives in a real repo.
Use generation tools to get something started.
Use agent-guardrails when the code is already in a real repo and needs to be trusted, reviewed, and maintained.
| Metric | Improvement |
|---|---|
| Change size | 60% smaller (fewer files, fewer lines) |
| Review time | 40% faster (clear scope, clear validation) |
| Incidents prevented | 95% of AI-related production issues caught at merge |
| Developer time saved | 20-40 hours/month (less incident response) |
Most early users will already have strong AI coding tools.
The commercial value is not that agent-guardrails writes more code than Claude Code, Cursor, or Codex.
The commercial value is that it makes AI-written code:
- easier to trust
- easier to review
- easier to maintain after repeated AI sessions
- safer to ship without building an internal workflow system
That is especially relevant for solo developers, consultants, and small teams who already pay for AI generation but still carry the review and rollback burden themselves.
See FAILURE_CASES.md for documented cases where agent-guardrails would have prevented production incidents:
- Case 1: The Parallel Abstraction Incident (40+ hours refactor debt)
- Case 2: The Untested Hot Path (45 min production downtime)
- Case 3: The Cross-Layer Import (2 AM wake-up call)
- Case 4: The Public Surface Change ($50K data exposure)
| Scenario | CodeRabbit | Sonar | Agent-Guardrails |
|---|---|---|---|
| Parallel abstraction created | ❌ | ❌ | ✅ |
| Test doesn't cover new branch | ❌ | ❌ | ✅ |
| Cross-layer import | ❌ | Partial | ✅ |
| Undeclared API surface change | ❌ | ❌ | ✅ |
| Task scope violation | ❌ | ❌ | ✅ |
| Missing rollback notes | ❌ | ❌ | ✅ |
The key difference: Agent-Guardrails understands the task context and repo rules, not just the code diff.
The simplest proof lives in the bounded-scope demo:
What it shows:
- the task contract narrows the change before implementation
- the finish-time check catches out-of-scope changes instead of leaving reviewers to notice later
- required commands and evidence are part of the workflow, not optional cleanup
Run it:
node ./examples/bounded-scope-demo/scripts/run-demo.mjs allWhy it matters:
- many normal AI coding workflows still generate first and sort out scope later
- this proof shows the repo can reject that pattern before merge
The public semantic demos show cases where a narrow diff can still be wrong for the repo:
- the pattern drift demo
- examples/pattern-drift-demo
- examples/interface-drift-demo
- examples/boundary-violation-demo
- examples/source-test-relevance-demo
What they prove:
- the OSS baseline can still look green while a semantic layer finds higher-signal drift
- repo consistency is not the same thing as passing basic scope checks
- the value is earlier repo-shaped judgment, not just more comments after the fact
Run them:
npm run demo:pattern-drift
npm run demo:interface-drift
npm run demo:boundary-violation
npm run demo:source-test-relevanceThe runtime does not stop at pass/fail.
It produces a reviewer-facing finish output that tells the human:
- what changed
- whether the scope held
- what validation ran
- what risk remains
That matters because the hard part is not only generating a diff. The hard part is producing a bounded, reviewable, maintainable result inside a real repo.
This is where agent-guardrails should feel different from a one-shot generation tool:
- lower review anxiety
- lower merge anxiety
- lower maintenance drift after the change ships
The support story should stay honest:
- Deepest support today: JavaScript / TypeScript
- Baseline runtime support today: Next.js, Python/FastAPI, monorepos
- Still expanding: deeper Python semantic support and broader framework-aware analysis
What that means:
- JavaScript / TypeScript currently has the strongest public semantic proof points
- Python already works through the same setup, contract, validation, evidence, and reviewer loop
- Python is the next language to deepen because it expands the product's real user pool more than adding only more TS/JS depth
This project should not claim equal depth across every language. It should show a strong path in one ecosystem, a usable baseline in another, and a credible expansion path after that.
The first Python/FastAPI proof lives here:
What it proves today:
- the
python-fastapipreset works through the same setup, contract, validation, evidence, and reviewer loop - deploy-readiness judgment and post-deploy maintenance output are not TS/JS-only ideas
- a Python repo can already show observability notes, rollback guidance, and operator next actions through the OSS runtime
What it does not claim:
- it is not Python semantic parity with the TS/JS path
- it does not mean Python-specific semantic detectors have shipped
- it is not a
plugin-pythonmilestone
Why it still matters:
- Python users can now try a real, production-shaped baseline path instead of only seeing
python-fastapilisted as a preset - the product can honestly say Python/FastAPI baseline proof is available today while deeper semantic support is still being built
Run it:
npm run demo:python-fastapiIf you want to see the product in under three steps:
- install it
- run
setup - try the bounded-scope sandbox
npm install -g agent-guardrails
agent-guardrails setup --agent claude-codeThen follow the setup output and use the sandbox:
If you only have a rough idea, start there anyway:
I only have a rough idea. Please read the repo rules, find the smallest safe change, and finish with a reviewer summary.