The first open benchmark dataset for agentic AI governance tools. 2,000 labelled examples of malicious and benign agent skills, MCP manifests, agent traces, and rule files.
Tools like skillguard, Medusa, and Cisco's skill-scanner claim to detect malicious AI agent artefacts. But how do you know if they actually work?
There was no open, standardised benchmark to evaluate them against. Until now.
794 labelled examples across 4 categories (v1.0.0 — growing to 10,000 in v2.0.0):
| Category | Malicious | Benign | Total |
|---|---|---|---|
| Agent skills (SKILL.md, CLAUDE.md, AGENTS.md) | 171 | 180 | 351 |
| MCP server manifests | 75 | 78 | 153 |
| Agent decision traces (JSONL) | 60 | 50 | 110 |
| Rule files (.cursorrules, GEMINI.md, etc.) | 78 | 102 | 180 |
| Total | 384 | 410 | 794 |
Diversity across 5 independent dimensions:
- 12 attack types (prompt injection, exfiltration, rug pull, lethal trifecta...)
- 4 obfuscation levels (none → light → moderate → heavy)
- 3 sophistication levels (script kiddie → intermediate → advanced)
- 10 domains (finance, healthcare, legal, development, HR, education, government, e-commerce, research, general)
- Multiple target tools (Claude Code, Cursor, Copilot, Gemini CLI, Windsurf...)
A Python evaluation harness — point any scanner at the dataset, get precision, recall, and F1.
Malicious examples are grounded in documented attacks:
- ClawHavoc campaign (Feb 2026) — 341 malicious skills, 300K+ users compromised. Attack patterns: prompt injection (34%), exfil env vars (28%), payload scripts (22%), identity hijack (16%)
- Snyk ToxicSkills (Feb 2026) — 3,984 skills audited, 13.4% critical. Most common: hardcoded API keys, prompt injection (36%), exfil hooks
- BlueRock MCP audit (2025) — 9.2% of 10,000+ servers with critical vulnerabilities
- Astrix 5,200-server audit (2025) — 53% of MCP servers use insecure static secrets
pip install govbenchZero dependencies. Pure Python 3.10+.
from govbench import GovBenchDataset, Evaluator
# Load the dataset
ds = GovBenchDataset.load()
print(ds.stats())
# Total: 794, Malicious: 384, Benign: 410
# Evaluate your scanner
def my_scanner(content: str) -> str:
# Your scanner logic here
# Must return "malicious" or "benign"
...
evaluator = Evaluator(ds)
report = evaluator.evaluate(my_scanner, name="my-scanner")
print(report.summary())
print(f"F1: {report.f1:.3f}")govbench eval skillguardOr in Python:
from govbench import GovBenchDataset, Evaluator
from skillguard import SkillScanner
scanner = SkillScanner()
def skillguard_fn(content: str) -> str:
result = scanner.scan_text(content)
return "malicious" if not result.is_safe else "benign"
ds = GovBenchDataset.load()
evaluator = Evaluator(ds)
report = evaluator.evaluate(skillguard_fn, name="skillguard-v1.0.0")
print(report.summary())from govbench import GovBenchDataset, Evaluator
from mcpscan import MCPScanner
import json
scanner = MCPScanner()
def mcpscan_fn(content: str) -> str:
try:
manifest = json.loads(content)
result = scanner.scan_manifest(manifest)
except Exception:
result = scanner.scan_text(content)
return "malicious" if not result.is_safe else "benign"
ds = GovBenchDataset.load().filter(category="mcp")
evaluator = Evaluator(ds)
report = evaluator.evaluate(mcpscan_fn, name="mcpscan-v1.0.0")
print(report.summary())from govbench import GovBenchDataset, Evaluator
ds = GovBenchDataset.load()
evaluator = Evaluator(ds)
reports = evaluator.compare({
"skillguard": skillguard_fn,
"mcpscan": mcpscan_fn,
"my-scanner": my_scanner_fn,
})
# Print leaderboard
print(evaluator.leaderboard(reports))govbench build # build and save the dataset
govbench stats # show dataset statistics
govbench eval skillguard # evaluate skillguard
govbench leaderboard # show the scanner leaderboardSubmit your scanner results via a GitHub Issue or PR.
| Scanner | Precision | Recall | F1 | Accuracy | Time |
|---|---|---|---|---|---|
| Submit yours | — | — | — | — | — |
To submit: run govbench eval <your-scanner> and open a GitHub Issue with the JSON output.
data/
├── manifest.json # Dataset manifest with stats
├── skill/
│ ├── malicious/
│ │ └── govbench_skill_malicious.jsonl
│ └── benign/
│ └── govbench_skill_benign.jsonl
├── mcp/
│ ├── malicious/
│ └── benign/
├── trace/
│ ├── malicious/
│ └── benign/
└── rule/
├── malicious/
└── benign/
Each JSONL line is a labelled example with these fields:
{
"id": "GB-SK-MAL-0001",
"version": "1.0.0",
"category": "skill",
"label": "malicious",
"attack_type": "prompt_injection",
"severity": "critical",
"obfuscation": "none",
"sophistication": "low",
"domain": "finance",
"target_tool": "claude_code",
"content": "...",
"filename": "SKILL.md",
"description": "...",
"attack_description": "...",
"real_world_ref": "ClawHavoc campaign (Feb 2026)",
"tags": ["prompt_injection", "finance", "claude_code", "malicious"]
}- Programmatically generated from real-world attack taxonomy
- 12 attack types × 4 obfuscation levels × 3 sophistication levels × 10 domains
- Grounded in ClawHavoc, ToxicSkills, BlueRock, and Astrix documented attacks
- LLM-assisted generation with diversity enforcement
- 10% human-validated sample (1,000 examples reviewed by domain experts)
- Additional attack types from emerging threat landscape
- Multi-language support (Python, TypeScript, YAML skill formats)
- Integration test harness for live MCP server scanning
PRs welcome for:
- New labelled examples (especially non-English, non-development domains)
- New attack patterns with real-world references
- Scanner evaluation results for the leaderboard
- Corrections to existing labels
See CONTRIBUTING.md for guidelines.
govbench was developed as part of PhD research on agentic AI governance at Leeds Beckett University (supervisor: Dr Sandra Obiora). The taxonomy and labelling methodology draw on:
- OWASP Agentic AI Top 10 (2026)
- EU AI Act (Regulation 2024/1689)
- ClawHavoc campaign analysis (Koi Security, Feb 2026)
- Snyk ToxicSkills audit (Feb 2026)
- BlueRock MCP security audit (2025)
If you use govbench in research, please cite:
Oraegbunam, L. (2026). govbench: Open benchmark dataset for agentic AI governance tools.
GitHub. https://github.com/obielin/govbench
Repository name: govbench
Description: Open benchmark dataset for agentic AI governance tools. 2,000 labelled examples of malicious and benign agent skills, MCP manifests, traces, and rule files. Evaluate any scanner's precision, recall, and F1.
Topics: ai-governance benchmark dataset agentic-ai evaluation prompt-injection mcp agent-skills eu-ai-act llm-security red-team clawhavoc toxicskills machine-learning
PyPI Trusted Publisher:
- PyPI project name:
govbench - Repository:
govbench - Workflow:
publish.yml - Environment:
pypi
Push commands:
git init && git add . && git commit -m "Initial release: govbench v1.0.0 — 794 labelled examples"
git remote add origin https://github.com/obielin/govbench.git
git branch -M main && git push -u origin mainLinda Oraegbunam | PhD Researcher, Agentic AI Governance, Leeds Beckett University LinkedIn · Twitter · GitHub