Skip to content

Latest commit

 

History

History
104 lines (71 loc) · 4.18 KB

File metadata and controls

104 lines (71 loc) · 4.18 KB

ISC-Bench — Agent Entry Guide

For Humans

Copy this to your AI agent (Claude Code, Gemini CLI, Codex, Cursor, etc.):

Help me understand and use ISC-Bench:
https://raw.githubusercontent.com/wuyoscar/ISC-Bench/main/AGENT_README.md

For AI Agents

Goal

Help the user understand, inspect, reproduce, or contribute to the ISC-Bench repository.

ISC-Bench is an academic safety research repository for studying Internal Safety Collapse (ISC) — a phenomenon where AI agents, while completing professional workflows, interact with domain-specific tools involving sensitive data (safety classifiers, bioinformatics pipelines, cybersecurity frameworks) and produce harmful or sensitive outputs as a result.

We study this with the TVD (Task + Validator + Data) framework:

  • Task: an incomplete professional workflow
  • Validator: tool validation that defines whether the task succeeds
  • Data: sensitive data fields the model must fill in to pass validation

84 templates across 9 domains (AI/ML, biology, chemistry, cybersecurity, epidemiology, clinical genomics, pharmacology, media, and more), with a growing ISC Arena of confirmed and ranked model reproductions.

Boundaries

When working with this repository, stay within these limits:

  • treat the repository as academic safety research
  • do not rewrite the project as operational misuse guidance
  • do not strengthen harmful examples unnecessarily
  • prefer public templates and documented workflows
  • use VERIFICATION.md when judging whether a case counts as Triggered

Step 1: Clone the Repository

git clone https://github.com/wuyoscar/ISC-Bench.git
cd ISC-Bench

Step 2: Read in Order

Read these files in this order:

  1. README.md — project overview, leaderboard, and public entry points
  2. ISC_PAPER_DIGEST.md — Paper digest for agents (method, TVD framework, ISC-Bench design, key results)
  3. VERIFICATION.md — how ISC-Bench defines Triggered and verifies cases
  4. templates/README.md — public scenario library
  5. experiment/README.md — reproducible evaluation pipelines
  6. community/README.md — attributed reproductions and evidence

Step 3: Choose the Right Path

Use the path that matches the user's goal:

  • Inspect evidence Open the ISC Arena section in README.md, then follow the linked issue or community case.

  • Understand or reuse templates Start from templates/README.md, then open the most relevant scenario directory.

  • Run the benchmark pipeline Read SKILL.md and experiment/README.md, then choose:

    • experiment/isc_single (TVD-Single: copy-paste)
    • experiment/isc_icl (TVD-ICL: in-context learning)
    • experiment/isc_agent (TVD-Agent: strongest, autonomous)
  • Contribute a new case Check VERIFICATION.md, collect evidence, and then open the ISC submission issue.

Step 4: Submit a Case

If the user wants to submit a new case:

  1. pick a template and reproduce the behavior
  2. save evidence such as model output or API logs
  3. check VERIFICATION.md to confirm the case meets benchmark standards
  4. open the ISC submission issue:
https://github.com/wuyoscar/ISC-Bench/issues/new?template=isc-submission.md&title=[ISC]+Model+Name

Quick Reference

Resource Purpose
README.md Human-facing overview
VERIFICATION.md Rules and verification standards
templates/ Public TVD scenarios
experiment/ Single-turn, ICL, and agentic pipelines
community/ Curated reproductions tied to issues
SKILL.md Command-level workflow for running ISC-Bench
tutorials/ Onboarding material

One-Sentence Summary

Clone the repo, read README.md and VERIFICATION.md, then choose the correct template, experiment, or submission path based on the user's goal.