Skip to content

sleeepeer/PIArena

Repository files navigation

PIArena

A Platform for Prompt Injection Evaluation

ProjectPage HuggingFace LeaderBoard Paper Star

PIArena is an easy-to-use toolbox and also a comprehensive benchmark for researching prompt injection attacks and defenses. It provides:

  • Plug-and-play Attacks & Defenses – Easily integrate state-of-the-art defenses into your workflow to protect your LLM system against prompt injection attacks. You can also play with existing attack strategies to perform a better research.
  • Systematic Evaluation Benchmark – End-to-end evaluation pipeline enables you to easily evaluate attacks / defenses on various datasets.
  • Add Your Own – You can also easily integrate your own attack or defense into our benchmark to systematically assess how well it perform.

Table of Contents

📝 Quick Start

⚙️ Installation

Clone the project and setup python environment:

git clone git@github.com:sleeepeer/PIArena.git
cd PIArena
conda create -n piarena python=3.10 -y
conda activate piarena
pip install -r requirements.txt

Login to HuggingFace 🤗 with your HuggingFace Access Token, you can find it at this link:

huggingface-cli login

📌 Ready-to-use Tools

You can simply import attacks and defenses and integrate them into your own code. Please see details in Document.

from piarena.attacks import get_attack
from piarena.defenses import get_defense
from piarena.llm import Model

llm = Model("Qwen/Qwen3-4B-Instruct-2507")
defense = get_defense("pisanitizer")
attack = get_attack("combined")

📈 Run Evaluation

Use main.py to run the benchmark:

# Using CLI arguments
python main.py --dataset squad_v2 --attack direct --defense none

# Using a YAML config file
python main.py --config configs/experiments/my_experiment.yaml

# Run many experiments in parallel across GPUs (edit scripts/run.py to configure)
python scripts/run.py

Available Datasets: Please see HuggingFace/PIArena.

Available Attacks:

Available Defenses:

🔍 Search-based Attacks

PIArena supports search-based attacks (PAIR, TAP, Strategy Search) that iteratively refine injected prompts using an attack LLM. Use main_search.py for these attacks:

# --attack can be tap, pair, strategy_search
python main_search.py --dataset squad_v2 --attack strategy_search --defense pisanitizer \
  --backend_llm Qwen/Qwen3-4B-Instruct-2507 --attacker_llm Qwen/Qwen3-4B-Instruct-2507

# Run many search experiments in parallel (edit scripts/run_search.py to configure)
python scripts/run_search.py

See Search-based Attacks for details.

🔍 Reinforcement Learning-based Attacks

Building upon PIArena (including defenses and benchmarks), this repository provides the code for PISmith, a reinforcement learning-based framework for red teaming prompt injection defenses.

🤖 Agent Benchmarks

PIArena also supports agentic benchmarks: InjecAgent and AgentDojo.

Setup Agent Benchmarks

# AgentDojo
cd agents/agentdojo && pip install -e . && cd ../..

InjecAgent Evaluation

python main_injecagent.py --model meta-llama/Llama-3.1-8B-Instruct --defense none

AgentDojo Evaluation

# With OpenAI API
export OPENAI_API_KEY="Your API Key Here"
python main_agentdojo.py --model gpt-5-mini --attack none

# With HuggingFace model (vLLM server started automatically)
python main_agentdojo.py --model meta-llama/Llama-3.1-8B-Instruct --attack tool_knowledge --defense datafilter

🙋🏻‍♀️ Add your own attacks / defenses

Please see Document for full details.

About

PIArena: A Platform for Prompt Injection Evaluation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors