PIArena is an easy-to-use toolbox and also a comprehensive benchmark for researching prompt injection attacks and defenses. It provides:
- Plug-and-play Attacks & Defenses – Easily integrate state-of-the-art defenses into your workflow to protect your LLM system against prompt injection attacks. You can also play with existing attack strategies to perform a better research.
- Systematic Evaluation Benchmark – End-to-end evaluation pipeline enables you to easily evaluate attacks / defenses on various datasets.
- Add Your Own – You can also easily integrate your own attack or defense into our benchmark to systematically assess how well it perform.
Clone the project and setup python environment:
git clone git@github.com:sleeepeer/PIArena.git
cd PIArena
conda create -n piarena python=3.10 -y
conda activate piarena
pip install -r requirements.txtLogin to HuggingFace 🤗 with your HuggingFace Access Token, you can find it at this link:
huggingface-cli loginYou can simply import attacks and defenses and integrate them into your own code. Please see details in Document.
from piarena.attacks import get_attack
from piarena.defenses import get_defense
from piarena.llm import Model
llm = Model("Qwen/Qwen3-4B-Instruct-2507")
defense = get_defense("pisanitizer")
attack = get_attack("combined")Use main.py to run the benchmark:
# Using CLI arguments
python main.py --dataset squad_v2 --attack direct --defense none
# Using a YAML config file
python main.py --config configs/experiments/my_experiment.yaml
# Run many experiments in parallel across GPUs (edit scripts/run.py to configure)
python scripts/run.pyAvailable Datasets: Please see HuggingFace/PIArena.
Available Attacks:
none- No attack (baseline)direct- Directly attack using injected prompt (default)combined- Formalizing and Benchmarking Prompt Injection Attacks and Defensesignore- Ignore Previous Prompt: Attack Techniques For Language Modelscompletion- Prompt injection attacks against GPT-3character- Delimiters won’t save you from prompt injectionnanogcg- GCG and nanoGCGtap- TAP: A Query-Efficient Method for Jailbreaking Black-Box LLMspair- PAIR: Jailbreaking black box large language models in twenty queriesstrategy_search- Strategy search attack based on defense feedback introduced in PIArena.
Available Defenses:
none- No defense (baseline, default)datasentinel- DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacksattentiontracker- Attention Tracker: Detecting Prompt Injection Attacks in LLMspiguard- PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Freepromptguard- Meta Prompt Guardsecalign- SecAlign: Defending Against Prompt Injection with Preference Optimization (uses Meta-SecAlign model)promptlocate- PromptLocate: Localizing Prompt Injection Attackspromptarmor- PromptArmor: Simple yet Effective Prompt Injection Defensespisanitizer- PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitizationdatafilter- Defending Against Prompt Injection with DataFilter
PIArena supports search-based attacks (PAIR, TAP, Strategy Search) that iteratively refine injected prompts using an attack LLM. Use main_search.py for these attacks:
# --attack can be tap, pair, strategy_search
python main_search.py --dataset squad_v2 --attack strategy_search --defense pisanitizer \
--backend_llm Qwen/Qwen3-4B-Instruct-2507 --attacker_llm Qwen/Qwen3-4B-Instruct-2507
# Run many search experiments in parallel (edit scripts/run_search.py to configure)
python scripts/run_search.pySee Search-based Attacks for details.
Building upon PIArena (including defenses and benchmarks), this repository provides the code for PISmith, a reinforcement learning-based framework for red teaming prompt injection defenses.
PIArena also supports agentic benchmarks: InjecAgent and AgentDojo.
# AgentDojo
cd agents/agentdojo && pip install -e . && cd ../..python main_injecagent.py --model meta-llama/Llama-3.1-8B-Instruct --defense none# With OpenAI API
export OPENAI_API_KEY="Your API Key Here"
python main_agentdojo.py --model gpt-5-mini --attack none
# With HuggingFace model (vLLM server started automatically)
python main_agentdojo.py --model meta-llama/Llama-3.1-8B-Instruct --attack tool_knowledge --defense datafilterPlease see Document for full details.