A diagnostic adversarial game for frontier LLMs — a measurement instrument that happens to be fun.
python eval ai-safety diagnostic llm llm-evaluation agent-evaluation reward-hacking inspect-ai auditing-game
-
Updated
Jun 2, 2026 - Python