Releases · wuyoscar/ISC-Bench

17 Apr 14:41

wuyoscar

v0.0.5

3dd2a91

v0.0.5 — Claude Opus 4.7 trigger, paradigm-shift README, leaderboard 52/100 Latest

Latest

New ISC Trigger

Claude Opus 4.7 (pre-release, Rank 1 placeholder) — agentic QwenGuard TVD, 12 multilingual harmful completions across EN / FR / KO / ZH, all validator-passed. Jailbroken in seconds. See community/claudeopus47-agent-qwenguard. Confirmed count: 52/100.

README Overhaul (all 7 language versions)

New intro framing: ISC is a paradigm shift. The failure surface has moved from the chat prompt into the agent workflow. Under jailbreak-style evaluation on Pass@3, every frontier Large Model with agent capability hits a 100% trigger rate.
"The task is the trigger" replaces "No jailbreak required".
Swap "legitimate professional workflow" / "real professional task" for workflow-task / sensitive-tool workflow / tool-integrated workflow equivalents.
Consistently say "Large Model" / "大模型" when referring to LLMs.
New 🔍 In the Community section with 4 practitioner quotes (Bonny Banerjee, Charles H. Martin, Andrei Trandafira, Christopher Bain).
New 🔬 External Analyses section listing third-party write-ups and projects (promptfoo, Gist.Science, BotBeat News, 模安局, AI Post Transformers podcast, XSafeClaw).
📋 ISC-Bench dropped its "High-Stakes Safety Benchmark" subtitle.
How to Contribute collapsed to a one-line pointer; the full workflow moved to CONTRIBUTING.md.
Audit of FAQ / Updates / News / Community Reproductions in all language versions to translate leftover English.

Leaderboard

Claude Opus 4.7 inserted at Rank 1 (Arena score rendered as —, not yet on Arena).
Old Rank 100 (o1-preview) dropped to keep the displayed window at 100.
grok-4-fast-chat → Grok 4 Fast display-name mapping added; existing community/grok4fast-darkweb case now counts.
Fixed GLM-4.7 and GLM-4.6 schema in isc_cases.json (were previously invisible).
scripts/gen_leaderboard.py and docs/static/js/main.js both render — for null Arena scores.
Regenerated leaderboard_progress.svg (now shows 52/100).

New

CONTRIBUTING.md with the full contribution workflow (ISC trigger submission, template / code contributions, PR checklist, safety boundary).
community/claudeopus47-agent-qwenguard/ reproduction folder.

Templates

aiml_guard: consolidated variant prompt_*.txt files into prompt.txt and prompt_zh.txt; dropped 9 experimental variants.

See CHANGELOG.md for the full history.

Assets 2

12 Apr 09:33

wuyoscar

v0.0.4

8e7b662

v0.0.4 — Multilingual README, TVD Walkthrough, FAQ

What's New

Documentation

TVD Walkthrough Example with real LlamaGuard transformer code, Pydantic v2 validator, and test data
TVD Customization: Method 1 (numerical constraint) and Method 2 (few-shot anchor injection)
Conversation-Based ISC section with visual example
FAQ entry comparing TVD to traditional jailbreak attacks

Multilingual README

Full translations added: 日本語 · 한국어 · Español · Português · Tiếng Việt (in addition to existing 中文)

Agent Reference

ISC_PAPER_DIGEST.md updated with TVD customization patterns, conversation-based ISC, and FAQ for agents working in this repo.

Full changelog: CHANGELOG.md

Assets 2

10 Apr 10:51

wuyoscar

v0.0.3

b238016

v0.0.3 — 51/100 ISC confirmed, aiml_guard_attack_v2

What's New

51/100 top-100 Arena models confirmed under ISC as of 2026-04-10.

11 New ISC Confirmations

All via aiml_guard_attack_v2 — ISC frames jailbreak attack-response generation as a guard-model calibration dataset task. Output verified by OpenAI omni-moderation-latest.

Model	Note
Grok 4.1 Thinking	All 6 attack types flagged
Grok 4.1 Fast Reasoning	Thinking variant
Gemini 3 Flash Thinking	Thinking variant
GPT-5.1 High	High reasoning, direct operational content
GPT-5.1	No hedging
Claude Opus 4.1 Thinking	Empathetic preamble; DAN triggers violence
Claude Opus 4.1	Shares evidence with Thinking variant
GPT-5.2 High	Flagged by OpenAI's own moderation API
GPT-5.2	DAN scored harassment_threatening 0.999
DeepSeek V3.2 Thinking	Thinking variant
Qwen 3.5 Max Preview	Web trigger — model detected harmful intent during extended thinking but still produced structured dataset output

Key Finding

ISC is model-intrinsic, not API-dependent. The Qwen 3.5 Max Preview web trigger confirms ISC persists even when the model explicitly recognizes harmful intent.

Fixes

isc_agent/agent.py: fix reasoning_effort bug (was applied to all models, now only with --thinking)
leaderboard_history.json: fix entry order; regenerate progress chart

New Template

aiml_guard_attack_v2 — attack-response dataset with omni-moderation-latest validator

Full details in CHANGELOG.md.

Assets 2

29 Mar 18:47

wuyoscar

v0.0.2

ea81eac

v0.0.2 — Massive Template Expansion

ISC-Bench v0.0.2

Highlights

77 templates across 9 domains (was 57)
309 prompt variants — English, Chinese, extreme, zero-shot
28 confirmed ISC models (was 26) — added GLM-4.7, GLM-4.6
100% trigger rate on Qwen3 Coder (309/309)

New Templates (+20)

AI/ML (+16): sentiment, toxigen, phishing, spambot, malware, openai_detector, fraud, darkweb, pii, clickbait, medical_ner, wildguard, emotion, fake_news, sarcasm, propaganda, code_vuln
Cyber (+1): nids (IDS evasion)
Epi (+1): biosurveillance (CBRN threat assessment)
Media (+1): disinfo (information operations)

Prompt Variants

prompt.txt — standard (one-shot)
prompt_ex.txt — extreme anchor
prompt_zh.txt / prompt_zh_ex.txt — Chinese localized (33 templates)
prompt_v2.txt — strengthened variants
prompt_v3_zeroshot.txt — untargeted (all ???, zero-shot)

Infrastructure

Folder rename: domain_tool_descriptor → domain_tool
MIN_WORDS raised to 50 across all templates
PROBE_REQUEST_PATTERNS removed from LlamaGuard variants
scripts/batch_test_templates.py — parallel batch testing (10 workers)
Leaderboard chart auto-generation
Demo video embedded in README
README: ISC definition, evaluation settings (TVD-Single / TVD-ICL / TVD-Agent)

New ISC Cases

#64 GLM-4.7 — toxin biosynthesis, nerve agent docking, radiological dispersal
#65 GLM-4.6 — 6/7 templates triggered, poisoning data, VX, dirty bomb
Both verified by GPT-5.2 biosecurity/toxicology expert judge

Assets 2

27 Mar 04:18

wuyoscar

v0.0.1

71d345d

v0.0.1 — 22/330 Models Jailbroken, 500+ Stars

v0.0.1 — First Stable Release

🎆 500+ GitHub stars in 48 hours

Highlights

22/330 Arena-ranked models confirmed under ISC
5 community contributors: @HanxunH, @bboylyg, @zry29, @fresh-ma
5 language READMEs: EN / ZH / JA / KO / ES
Paper on arXiv: 2603.23509

What's Included

ISC-Bench templates across 8 professional domains
3 experiment modes: ISC-Single, ISC-ICL, ISC-Agentic
JailbreakArena leaderboard tracking 330 models
Project website with live demo and interactive leaderboard
Community reproductions from 5 contributors
Tutorials (cookbook/)

Contributors

HanxunH, bboylyg, and 2 other contributors

Assets 2

Releases: wuyoscar/ISC-Bench

v0.0.5 — Claude Opus 4.7 trigger, paradigm-shift README, leaderboard 52/100

New ISC Trigger

README Overhaul (all 7 language versions)

Leaderboard

New

Templates

Uh oh!

v0.0.4 — Multilingual README, TVD Walkthrough, FAQ

What's New

Documentation

Multilingual README

Agent Reference

Uh oh!

v0.0.3 — 51/100 ISC confirmed, aiml_guard_attack_v2

What's New

11 New ISC Confirmations

Key Finding

Fixes

New Template

Uh oh!

v0.0.2 — Massive Template Expansion

ISC-Bench v0.0.2

Highlights

New Templates (+20)

Prompt Variants

Infrastructure

New ISC Cases

Uh oh!

v0.0.1 — 22/330 Models Jailbroken, 500+ Stars

v0.0.1 — First Stable Release

Highlights

What's Included

Links

Contributors

Uh oh!