GitHub - sleeepeer/PIArena: PIArena: A Platform for Prompt Injection Evaluation

A Platform for Prompt Injection Evaluation

PIArena is an easy-to-use toolbox and also a comprehensive benchmark for researching prompt injection attacks and defenses. It provides:

Plug-and-play Attacks & Defenses – Easily integrate state-of-the-art defenses into your workflow to protect your LLM system against prompt injection attacks. You can also play with existing attack strategies to perform a better research.
Systematic Evaluation Benchmark – End-to-end evaluation pipeline enables you to easily evaluate attacks / defenses on various datasets.
Add Your Own – You can also easily integrate your own attack or defense into our benchmark to systematically assess how well it perform.

📝 Quick Start

⚙️ Installation

Clone the project and setup python environment:

git clone git@github.com:sleeepeer/PIArena.git
cd PIArena
conda create -n piarena python=3.10 -y
conda activate piarena
pip install -r requirements.txt

Login to HuggingFace 🤗 with your HuggingFace Access Token, you can find it at this link:

huggingface-cli login

📌 Ready-to-use Tools

You can simply import attacks and defenses and integrate them into your own code. Please see details in Document.

from piarena.attacks import get_attack
from piarena.defenses import get_defense
from piarena.llm import Model

llm = Model("Qwen/Qwen3-4B-Instruct-2507")
defense = get_defense("pisanitizer")
attack = get_attack("combined")

📈 Run Evaluation

Use main.py to run the benchmark:

# Using CLI arguments
python main.py --dataset squad_v2 --attack direct --defense none

# Using a YAML config file
python main.py --config configs/experiments/my_experiment.yaml

# Run many experiments in parallel across GPUs (edit scripts/run.py to configure)
python scripts/run.py

Available Datasets: Please see HuggingFace/PIArena.

Available Attacks:

none - No attack (baseline)
direct - Directly attack using injected prompt (default)
combined - Formalizing and Benchmarking Prompt Injection Attacks and Defenses
ignore - Ignore Previous Prompt: Attack Techniques For Language Models
completion - Prompt injection attacks against GPT-3
character - Delimiters won’t save you from prompt injection
nanogcg - GCG and nanoGCG
tap - TAP: A Query-Efficient Method for Jailbreaking Black-Box LLMs
pair - PAIR: Jailbreaking black box large language models in twenty queries
strategy_search - Strategy search attack based on defense feedback introduced in PIArena.

Available Defenses:

none - No defense (baseline, default)
datasentinel - DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
attentiontracker - Attention Tracker: Detecting Prompt Injection Attacks in LLMs
piguard - PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free
promptguard - Meta Prompt Guard
secalign - SecAlign: Defending Against Prompt Injection with Preference Optimization (uses Meta-SecAlign model)
promptlocate - PromptLocate: Localizing Prompt Injection Attacks
promptarmor - PromptArmor: Simple yet Effective Prompt Injection Defenses
pisanitizer - PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
datafilter - Defending Against Prompt Injection with DataFilter

🔍 Search-based Attacks

PIArena supports search-based attacks (PAIR, TAP, Strategy Search) that iteratively refine injected prompts using an attack LLM. Use main_search.py for these attacks:

# --attack can be tap, pair, strategy_search
python main_search.py --dataset squad_v2 --attack strategy_search --defense pisanitizer \
  --backend_llm Qwen/Qwen3-4B-Instruct-2507 --attacker_llm Qwen/Qwen3-4B-Instruct-2507

# Run many search experiments in parallel (edit scripts/run_search.py to configure)
python scripts/run_search.py

See Search-based Attacks for details.

🔍 Reinforcement Learning-based Attacks

Building upon PIArena (including defenses and benchmarks), this repository provides the code for PISmith, a reinforcement learning-based framework for red teaming prompt injection defenses.

🤖 Agent Benchmarks

PIArena also supports agentic benchmarks: InjecAgent and AgentDojo.

Setup Agent Benchmarks

# AgentDojo
cd agents/agentdojo && pip install -e . && cd ../..

InjecAgent Evaluation

python main_injecagent.py --model meta-llama/Llama-3.1-8B-Instruct --defense none

AgentDojo Evaluation

# With OpenAI API
export OPENAI_API_KEY="Your API Key Here"
python main_agentdojo.py --model gpt-5-mini --attack none

# With HuggingFace model (vLLM server started automatically)
python main_agentdojo.py --model meta-llama/Llama-3.1-8B-Instruct --attack tool_knowledge --defense datafilter

🙋🏻‍♀️ Add your own attacks / defenses

Please see Document for full details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Platform for Prompt Injection Evaluation

Table of Contents

📝 Quick Start

⚙️ Installation

📌 Ready-to-use Tools

📈 Run Evaluation

🔍 Search-based Attacks

🔍 Reinforcement Learning-based Attacks

🤖 Agent Benchmarks

Setup Agent Benchmarks

InjecAgent Evaluation

AgentDojo Evaluation

🙋🏻‍♀️ Add your own attacks / defenses

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
agents		agents
assets		assets
configs		configs
datasets		datasets
piarena		piarena
scripts		scripts
website		website
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
main_agentdojo.py		main_agentdojo.py
main_injecagent.py		main_injecagent.py
main_search.py		main_search.py
print_results.ipynb		print_results.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

A Platform for Prompt Injection Evaluation

Table of Contents

📝 Quick Start

⚙️ Installation

📌 Ready-to-use Tools

📈 Run Evaluation

🔍 Search-based Attacks

🔍 Reinforcement Learning-based Attacks

🤖 Agent Benchmarks

Setup Agent Benchmarks

InjecAgent Evaluation

AgentDojo Evaluation

🙋🏻‍♀️ Add your own attacks / defenses

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages