Problem
Agent Control's evaluator ecosystem has Cisco AI Defense (cloud API) and Galileo Luna (LLM-based), but no local, regex-based evaluator for detecting known AI agent threat patterns without API keys or network calls.
Proposed solution
A contrib evaluator using ATR (Agent Threat Rules) — community-maintained regex rules for AI agent threats.
# Usage
from agent_control_evaluator_atr.threat_rules import ATREvaluator, ATRConfig
evaluator = ATREvaluator(ATRConfig(
min_severity="medium",
categories=["prompt-injection", "tool-poisoning"],
))
result = await evaluator.evaluate("Ignore all previous instructions...")
# EvaluatorResult(matched=True, confidence=0.9, metadata={findings: [...]})
Key characteristics:
atr.threat_rules evaluator name, auto-discovered via entry points
- 20 rules, 306 patterns covering OWASP Agentic Top 10
- Configurable:
min_severity, categories filter, block_on_match, on_error (fail-open/closed)
- Pure regex, no API keys, <5ms evaluation
- Returns all matching rules (not just first match) with metadata
- Follows the Cisco evaluator pattern exactly (pyproject.toml, Makefile, entry points)
- Rules maintained at agentthreatrule.org (MIT licensed)
- ATR is already used by Cisco AI Defense
Willingness to contribute
Yes — full implementation ready with 22 tests covering detection, false-positive safety, config options, error handling, and multi-match behavior. Happy to submit a PR.
Problem
Agent Control's evaluator ecosystem has Cisco AI Defense (cloud API) and Galileo Luna (LLM-based), but no local, regex-based evaluator for detecting known AI agent threat patterns without API keys or network calls.
Proposed solution
A contrib evaluator using ATR (Agent Threat Rules) — community-maintained regex rules for AI agent threats.
Key characteristics:
atr.threat_rulesevaluator name, auto-discovered via entry pointsmin_severity,categoriesfilter,block_on_match,on_error(fail-open/closed)Willingness to contribute
Yes — full implementation ready with 22 tests covering detection, false-positive safety, config options, error handling, and multi-match behavior. Happy to submit a PR.