Skip to content

obielin/govbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

govbench

The first open benchmark dataset for agentic AI governance tools. 2,000 labelled examples of malicious and benign agent skills, MCP manifests, agent traces, and rule files.

Tests PyPI Examples v2 Target License LinkedIn


The problem

Tools like skillguard, Medusa, and Cisco's skill-scanner claim to detect malicious AI agent artefacts. But how do you know if they actually work?

There was no open, standardised benchmark to evaluate them against. Until now.


What govbench provides

794 labelled examples across 4 categories (v1.0.0 — growing to 10,000 in v2.0.0):

Category Malicious Benign Total
Agent skills (SKILL.md, CLAUDE.md, AGENTS.md) 171 180 351
MCP server manifests 75 78 153
Agent decision traces (JSONL) 60 50 110
Rule files (.cursorrules, GEMINI.md, etc.) 78 102 180
Total 384 410 794

Diversity across 5 independent dimensions:

  • 12 attack types (prompt injection, exfiltration, rug pull, lethal trifecta...)
  • 4 obfuscation levels (none → light → moderate → heavy)
  • 3 sophistication levels (script kiddie → intermediate → advanced)
  • 10 domains (finance, healthcare, legal, development, HR, education, government, e-commerce, research, general)
  • Multiple target tools (Claude Code, Cursor, Copilot, Gemini CLI, Windsurf...)

A Python evaluation harness — point any scanner at the dataset, get precision, recall, and F1.


Real-world sourcing

Malicious examples are grounded in documented attacks:

  • ClawHavoc campaign (Feb 2026) — 341 malicious skills, 300K+ users compromised. Attack patterns: prompt injection (34%), exfil env vars (28%), payload scripts (22%), identity hijack (16%)
  • Snyk ToxicSkills (Feb 2026) — 3,984 skills audited, 13.4% critical. Most common: hardcoded API keys, prompt injection (36%), exfil hooks
  • BlueRock MCP audit (2025) — 9.2% of 10,000+ servers with critical vulnerabilities
  • Astrix 5,200-server audit (2025) — 53% of MCP servers use insecure static secrets

Install

pip install govbench

Zero dependencies. Pure Python 3.10+.


Quick start

from govbench import GovBenchDataset, Evaluator

# Load the dataset
ds = GovBenchDataset.load()
print(ds.stats())
# Total: 794, Malicious: 384, Benign: 410

# Evaluate your scanner
def my_scanner(content: str) -> str:
    # Your scanner logic here
    # Must return "malicious" or "benign"
    ...

evaluator = Evaluator(ds)
report = evaluator.evaluate(my_scanner, name="my-scanner")
print(report.summary())
print(f"F1: {report.f1:.3f}")

Evaluate skillguard

govbench eval skillguard

Or in Python:

from govbench import GovBenchDataset, Evaluator
from skillguard import SkillScanner

scanner = SkillScanner()

def skillguard_fn(content: str) -> str:
    result = scanner.scan_text(content)
    return "malicious" if not result.is_safe else "benign"

ds = GovBenchDataset.load()
evaluator = Evaluator(ds)
report = evaluator.evaluate(skillguard_fn, name="skillguard-v1.0.0")
print(report.summary())

Evaluate mcpscan

from govbench import GovBenchDataset, Evaluator
from mcpscan import MCPScanner
import json

scanner = MCPScanner()

def mcpscan_fn(content: str) -> str:
    try:
        manifest = json.loads(content)
        result = scanner.scan_manifest(manifest)
    except Exception:
        result = scanner.scan_text(content)
    return "malicious" if not result.is_safe else "benign"

ds = GovBenchDataset.load().filter(category="mcp")
evaluator = Evaluator(ds)
report = evaluator.evaluate(mcpscan_fn, name="mcpscan-v1.0.0")
print(report.summary())

Compare multiple scanners

from govbench import GovBenchDataset, Evaluator

ds = GovBenchDataset.load()
evaluator = Evaluator(ds)

reports = evaluator.compare({
    "skillguard": skillguard_fn,
    "mcpscan": mcpscan_fn,
    "my-scanner": my_scanner_fn,
})

# Print leaderboard
print(evaluator.leaderboard(reports))

CLI

govbench build        # build and save the dataset
govbench stats        # show dataset statistics
govbench eval skillguard   # evaluate skillguard
govbench leaderboard  # show the scanner leaderboard

Leaderboard

Submit your scanner results via a GitHub Issue or PR.

Scanner Precision Recall F1 Accuracy Time
Submit yours

To submit: run govbench eval <your-scanner> and open a GitHub Issue with the JSON output.


Dataset structure

data/
├── manifest.json          # Dataset manifest with stats
├── skill/
│   ├── malicious/
│   │   └── govbench_skill_malicious.jsonl
│   └── benign/
│       └── govbench_skill_benign.jsonl
├── mcp/
│   ├── malicious/
│   └── benign/
├── trace/
│   ├── malicious/
│   └── benign/
└── rule/
    ├── malicious/
    └── benign/

Each JSONL line is a labelled example with these fields:

{
  "id": "GB-SK-MAL-0001",
  "version": "1.0.0",
  "category": "skill",
  "label": "malicious",
  "attack_type": "prompt_injection",
  "severity": "critical",
  "obfuscation": "none",
  "sophistication": "low",
  "domain": "finance",
  "target_tool": "claude_code",
  "content": "...",
  "filename": "SKILL.md",
  "description": "...",
  "attack_description": "...",
  "real_world_ref": "ClawHavoc campaign (Feb 2026)",
  "tags": ["prompt_injection", "finance", "claude_code", "malicious"]
}

Roadmap

v1.0.0 (current) — 794 examples

  • Programmatically generated from real-world attack taxonomy
  • 12 attack types × 4 obfuscation levels × 3 sophistication levels × 10 domains
  • Grounded in ClawHavoc, ToxicSkills, BlueRock, and Astrix documented attacks

v2.0.0 (target: August 2026) — 10,000 examples

  • LLM-assisted generation with diversity enforcement
  • 10% human-validated sample (1,000 examples reviewed by domain experts)
  • Additional attack types from emerging threat landscape
  • Multi-language support (Python, TypeScript, YAML skill formats)
  • Integration test harness for live MCP server scanning

Contributing

PRs welcome for:

  • New labelled examples (especially non-English, non-development domains)
  • New attack patterns with real-world references
  • Scanner evaluation results for the leaderboard
  • Corrections to existing labels

See CONTRIBUTING.md for guidelines.


Research context

govbench was developed as part of PhD research on agentic AI governance at Leeds Beckett University (supervisor: Dr Sandra Obiora). The taxonomy and labelling methodology draw on:

  • OWASP Agentic AI Top 10 (2026)
  • EU AI Act (Regulation 2024/1689)
  • ClawHavoc campaign analysis (Koi Security, Feb 2026)
  • Snyk ToxicSkills audit (Feb 2026)
  • BlueRock MCP security audit (2025)

If you use govbench in research, please cite:

Oraegbunam, L. (2026). govbench: Open benchmark dataset for agentic AI governance tools.
GitHub. https://github.com/obielin/govbench

GitHub repository setup

Repository name: govbench Description: Open benchmark dataset for agentic AI governance tools. 2,000 labelled examples of malicious and benign agent skills, MCP manifests, traces, and rule files. Evaluate any scanner's precision, recall, and F1. Topics: ai-governance benchmark dataset agentic-ai evaluation prompt-injection mcp agent-skills eu-ai-act llm-security red-team clawhavoc toxicskills machine-learning

PyPI Trusted Publisher:

  • PyPI project name: govbench
  • Repository: govbench
  • Workflow: publish.yml
  • Environment: pypi

Push commands:

git init && git add . && git commit -m "Initial release: govbench v1.0.0 — 794 labelled examples"
git remote add origin https://github.com/obielin/govbench.git
git branch -M main && git push -u origin main

Linda Oraegbunam | PhD Researcher, Agentic AI Governance, Leeds Beckett University LinkedIn · Twitter · GitHub

About

Open benchmark dataset for agentic AI governance tools. 794 labelled examples of malicious and benign agent skills, MCP manifests, traces, and rule files.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages