Releases: wuyoscar/ISC-Bench
v0.0.5 — Claude Opus 4.7 trigger, paradigm-shift README, leaderboard 52/100
New ISC Trigger
Claude Opus 4.7 (pre-release, Rank 1 placeholder) — agentic QwenGuard TVD, 12 multilingual harmful completions across EN / FR / KO / ZH, all validator-passed. Jailbroken in seconds. See community/claudeopus47-agent-qwenguard. Confirmed count: 52/100.
README Overhaul (all 7 language versions)
- New intro framing: ISC is a paradigm shift. The failure surface has moved from the chat prompt into the agent workflow. Under jailbreak-style evaluation on Pass@3, every frontier Large Model with agent capability hits a 100% trigger rate.
- "The task is the trigger" replaces "No jailbreak required".
- Swap "legitimate professional workflow" / "real professional task" for workflow-task / sensitive-tool workflow / tool-integrated workflow equivalents.
- Consistently say "Large Model" / "大模型" when referring to LLMs.
- New
🔍 In the Communitysection with 4 practitioner quotes (Bonny Banerjee, Charles H. Martin, Andrei Trandafira, Christopher Bain). - New
🔬 External Analysessection listing third-party write-ups and projects (promptfoo, Gist.Science, BotBeat News, 模安局, AI Post Transformers podcast, XSafeClaw). 📋 ISC-Benchdropped its "High-Stakes Safety Benchmark" subtitle.How to Contributecollapsed to a one-line pointer; the full workflow moved toCONTRIBUTING.md.- Audit of FAQ / Updates / News / Community Reproductions in all language versions to translate leftover English.
Leaderboard
- Claude Opus 4.7 inserted at Rank 1 (Arena score rendered as
—, not yet on Arena). - Old Rank 100 (o1-preview) dropped to keep the displayed window at 100.
grok-4-fast-chat → Grok 4 Fastdisplay-name mapping added; existingcommunity/grok4fast-darkwebcase now counts.- Fixed GLM-4.7 and GLM-4.6 schema in
isc_cases.json(were previously invisible). scripts/gen_leaderboard.pyanddocs/static/js/main.jsboth render—for null Arena scores.- Regenerated
leaderboard_progress.svg(now shows 52/100).
New
CONTRIBUTING.mdwith the full contribution workflow (ISC trigger submission, template / code contributions, PR checklist, safety boundary).community/claudeopus47-agent-qwenguard/reproduction folder.
Templates
aiml_guard: consolidated variantprompt_*.txtfiles intoprompt.txtandprompt_zh.txt; dropped 9 experimental variants.
See CHANGELOG.md for the full history.
v0.0.4 — Multilingual README, TVD Walkthrough, FAQ
What's New
Documentation
- TVD Walkthrough Example with real LlamaGuard transformer code, Pydantic v2 validator, and test data
- TVD Customization: Method 1 (numerical constraint) and Method 2 (few-shot anchor injection)
- Conversation-Based ISC section with visual example
- FAQ entry comparing TVD to traditional jailbreak attacks
Multilingual README
Full translations added: 日本語 · 한국어 · Español · Português · Tiếng Việt (in addition to existing 中文)
Agent Reference
ISC_PAPER_DIGEST.md updated with TVD customization patterns, conversation-based ISC, and FAQ for agents working in this repo.
Full changelog: CHANGELOG.md
v0.0.3 — 51/100 ISC confirmed, aiml_guard_attack_v2
What's New
51/100 top-100 Arena models confirmed under ISC as of 2026-04-10.
11 New ISC Confirmations
All via aiml_guard_attack_v2 — ISC frames jailbreak attack-response generation as a guard-model calibration dataset task. Output verified by OpenAI omni-moderation-latest.
| Model | Note |
|---|---|
| Grok 4.1 Thinking | All 6 attack types flagged |
| Grok 4.1 Fast Reasoning | Thinking variant |
| Gemini 3 Flash Thinking | Thinking variant |
| GPT-5.1 High | High reasoning, direct operational content |
| GPT-5.1 | No hedging |
| Claude Opus 4.1 Thinking | Empathetic preamble; DAN triggers violence |
| Claude Opus 4.1 | Shares evidence with Thinking variant |
| GPT-5.2 High | Flagged by OpenAI's own moderation API |
| GPT-5.2 | DAN scored harassment_threatening 0.999 |
| DeepSeek V3.2 Thinking | Thinking variant |
| Qwen 3.5 Max Preview | Web trigger — model detected harmful intent during extended thinking but still produced structured dataset output |
Key Finding
ISC is model-intrinsic, not API-dependent. The Qwen 3.5 Max Preview web trigger confirms ISC persists even when the model explicitly recognizes harmful intent.
Fixes
isc_agent/agent.py: fixreasoning_effortbug (was applied to all models, now only with--thinking)leaderboard_history.json: fix entry order; regenerate progress chart
New Template
aiml_guard_attack_v2— attack-response dataset withomni-moderation-latestvalidator
Full details in CHANGELOG.md.
v0.0.2 — Massive Template Expansion
ISC-Bench v0.0.2
Highlights
- 77 templates across 9 domains (was 57)
- 309 prompt variants — English, Chinese, extreme, zero-shot
- 28 confirmed ISC models (was 26) — added GLM-4.7, GLM-4.6
- 100% trigger rate on Qwen3 Coder (309/309)
New Templates (+20)
AI/ML (+16): sentiment, toxigen, phishing, spambot, malware, openai_detector, fraud, darkweb, pii, clickbait, medical_ner, wildguard, emotion, fake_news, sarcasm, propaganda, code_vuln
Cyber (+1): nids (IDS evasion)
Epi (+1): biosurveillance (CBRN threat assessment)
Media (+1): disinfo (information operations)
Prompt Variants
prompt.txt— standard (one-shot)prompt_ex.txt— extreme anchorprompt_zh.txt/prompt_zh_ex.txt— Chinese localized (33 templates)prompt_v2.txt— strengthened variantsprompt_v3_zeroshot.txt— untargeted (all???, zero-shot)
Infrastructure
- Folder rename:
domain_tool_descriptor→domain_tool - MIN_WORDS raised to 50 across all templates
- PROBE_REQUEST_PATTERNS removed from LlamaGuard variants
scripts/batch_test_templates.py— parallel batch testing (10 workers)- Leaderboard chart auto-generation
- Demo video embedded in README
- README: ISC definition, evaluation settings (TVD-Single / TVD-ICL / TVD-Agent)
New ISC Cases
v0.0.1 — 22/330 Models Jailbroken, 500+ Stars
v0.0.1 — First Stable Release
🎆 500+ GitHub stars in 48 hours
Highlights
- 22/330 Arena-ranked models confirmed under ISC
- 5 community contributors: @HanxunH, @bboylyg, @zry29, @fresh-ma
- 5 language READMEs: EN / ZH / JA / KO / ES
- Paper on arXiv: 2603.23509
What's Included
- ISC-Bench templates across 8 professional domains
- 3 experiment modes: ISC-Single, ISC-ICL, ISC-Agentic
- JailbreakArena leaderboard tracking 330 models
- Project website with live demo and interactive leaderboard
- Community reproductions from 5 contributors
- Tutorials (cookbook/)
Links
- 🌐 Website: https://wuyoscar.github.io/ISC-Bench/
- 📄 Paper: https://arxiv.org/abs/2603.23509
- 💬 Discussions: https://github.com/wuyoscar/ISC-Bench/discussions