PPTArena: A Benchmark for Agentic PowerPoint Editing

Abstract

We introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast to image-PDF renderings or text-to-slide generation, PPTArena focuses on in-place editing across 100 decks, 2125 slides, and over 800 targeted edits covering text, charts, tables, animations, and master-level styles. Each case includes a ground-truth deck, a fully specified target outcome, and a dual VLM-as-judge pipeline that separately scores instruction following and visual quality using both structural diffs and slide images.

Building on this setting, we propose PPTPilot, a structure-aware slide-editing agent that plans semantic edit sequences, routes between high-level programmatic tools and deterministic XML operations for precise control, and verifies outputs through an iterative plan-edit-check loop against task-specific constraints. In our experiments, PPTPilot outperforms strong proprietary agents and frontier VLM systems by over 10 percentage points on compound, layout-sensitive, and cross-slide edits, with particularly large gains in visual fidelity and deck-wide consistency.

Features

Agentic Editing: Edit PowerPoint presentations using natural language instructions.
Dual-View Evaluation: Compare "Original" vs "Ground Truth" slides side-by-side.
Iterative Refinement: The agent plans, executes, and verifies edits in a loop.
Multi-Modal Judge: Automated evaluation using VLM-as-a-judge for both instruction following and visual quality.
Comprehensive Benchmark: Covers diverse tasks including text editing, chart manipulation, layout adjustments, and image handling.

WebApp: https://ppt-arena.onrender.com/evaluation

Directory Structure

src/: Core application code (Flask app, LLM handlers, PPT processing) and configuration files (requirements.txt, evaluation_pairs_refined.json).
Original/: Benchmark dataset - Original PowerPoint files.
GroundTruth/: Benchmark dataset - Ground Truth PowerPoint files.
paper/: Contains the paper LaTeX source and figures.

Installation

Clone the repository:

git clone https://github.com/michaelofengenden/PPTArena.git
cd PPTArena

Install dependencies:
```
pip install -r src/requirements.txt
```
Configure API Keys: Create a credentials.env file in the root directory:
```
OPENAI_API_KEY=your_openai_key
GEMINI_API_KEY=your_gemini_key
```

Usage

Start the Application:
```
python src/app.py
```
Access the Web Interface: Open http://localhost:5000 in your browser.
Evaluate:
- Go to the Evaluation tab.
- Select a test case to see the Original and Ground Truth.
- Click "Generate prediction" to run the agent on the task.
- Use "Call LLM Judge" to score the result.
Chat & Edit:
- Go to the Chat tab.
- Upload any .pptx file.
- Type instructions like "Change the title font to Arial" or "Add a bar chart with this data...".

Citation

If you find this work useful, please cite our paper:

@article{ofengenden2025pptarena,
  title={PPTArena: A Benchmark for Agentic PowerPoint Editing},
  author={Ofengenden, Michael and Man, Yunze and Pang, Ziqi and Wang, Yu-Xiong},
  journal={arXiv preprint arXiv:2512.03042},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
GroundTruth		GroundTruth
Original		Original
benchmark_outputs		benchmark_outputs
claude_edits		claude_edits
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
benchmark_report.html		benchmark_report.html
benchmark_results.json		benchmark_results.json
evaluation_report.html		evaluation_report.html
evaluation_results_bulk.json		evaluation_results_bulk.json
render.yaml		render.yaml
run_benchmark.py		run_benchmark.py
run_claude_parallel.py		run_claude_parallel.py
run_evaluation.py		run_evaluation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPTArena: A Benchmark for Agentic PowerPoint Editing

Abstract

Features

Directory Structure

Installation

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PPTArena: A Benchmark for Agentic PowerPoint Editing

Abstract

Features

Directory Structure

Installation

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages