A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning
Paper · Quick Start · Workflow · Architecture · Citation
Max Zimmer ·
Nico Pelleriti ·
Christophe Roux ·
Sebastian Pokutta
IOL Lab · Zuse Institute Berlin & TU Berlin
The Agentic Researcher launches AI coding agents inside sandboxed containers with filesystem isolation, GPU support, and structured research instructions.
Supports Claude Code, OpenCode, Gemini CLI, and Codex CLI.
- Docker (default) or Apptainer (Linux only)
- An API key or OAuth login for your chosen CLI tool (see supported tools)
- GPU drivers installed on the host if you want GPU passthrough
- Project dependencies managed with uv (recommended) — the agent runs
uv syncinside the sandbox
# 1. Clone the repository
git clone https://github.com/ZIB-IOL/The-Agentic-Researcher.git
cd The-Agentic-Researcher
# 2. Install
./scripts/install.sh
# 3a. Build container for Docker (default)
agentic-researcher --build
# 3b. Build container for Apptainer (Linux only)
agentic-researcher --apptainer --buildDocker is the default runtime. By default the launcher stores state under ~/.cache/agentic-researcher and launches Claude Code. Claude uses OAuth by default; other CLIs handle auth inside the tool, with standard API key env vars passed through if set.
Run agentic-researcher --setup to create a configuration file at ~/.config/agentic-researcher/config.sh. The setup wizard lets you configure:
- Container runtime — Docker or Apptainer
- CLI tool — Claude Code, OpenCode, Gemini CLI, or Codex CLI
- Authentication — OAuth login or API key (with configurable env var name)
- Custom API endpoint — point Claude at an Anthropic-compatible proxy or gateway
- State/cache directory (
AR_STATE_ROOT) — where caches, container/tmp, and tool state are stored. Defaults to~/.cache/agentic-researcher. On HPC systems with Apptainer, set this to a path with sufficient space (e.g. on a scratch filesystem) to avoid hitting the default 64 MB overlay limit - Extra environment variables (
AR_EXTRA_ENV) — colon-separatedKEY=VALUEpairs forwarded into the container (e.g.HF_TOKEN=hf_...:WANDB_API_KEY=...) - Network proxy — HTTP/HTTPS proxy settings for use inside the container
- Extra bind directories — additional host paths to mount into the sandbox
You can re-run --setup at any time to update your configuration.
# Sandbox current directory with Claude Code (default)
agentic-researcher
# Sandbox a specific project directory
agentic-researcher ~/my-project
# Use a different CLI tool
agentic-researcher --tool gemini
# Auto-approve all tool calls
agentic-researcher --yolo| Tool | Instruction file | Provider | Flag |
|---|---|---|---|
| Claude Code | CLAUDE.md |
Anthropic | --tool claude (default) |
| OpenCode | AGENTS.md |
Any (LiteLLM) | --tool opencode |
| Gemini CLI | GEMINI.md |
--tool gemini |
|
| Codex CLI | AGENTS.md |
OpenAI | --tool codex |
- Launch the sandbox from your project directory: e.g.,
agentic-researcher --yolo - Run
/setup_research_planinside the CLI agent. This starts an interactive dialogue that asks about your research goal, evaluation metrics, constraints, and compute budget. - The agent fills in the Project Instructions section of the instruction file (
CLAUDE.md,GEMINI.md, orAGENTS.md) and creates the initial tracking files (report.tex,TODO.md).
When you relaunch the sandbox on a project that already has filled-in instructions, running /setup_research_plan will automatically detect the existing state, read report.tex and TODO.md, and summarize where the project left off before continuing.
| Layer | Details |
|---|---|
| Filesystem isolation | The agent can only access /workspace; extra directories from AR_EXTRA_BIND_DIRS are mounted under /workspace/.mount/<basename> |
| Namespace isolation | Apptainer --compat enables user/mount namespaces |
| Path traversal protection | Symlinks resolved; system directories blocked |
--yolo auto-approves tool calls but does not weaken filesystem isolation.
The framework ships INSTRUCTIONS.md as a canonical template containing universal research commandments (e.g., never manipulate evaluation, one variable per experiment, record everything) and domain-specific modules for mathematical and compute-intensive research. At launch it is copied into the workspace under the filename required by the selected tool. The /setup_research_plan command then fills in the project-specific section through an interactive dialogue.
If you use this framework, please cite our paper:
@misc{zimmer2026agenticresearcherpracticalguide,
title = {The Agentic Researcher: A Practical Guide to AI-Assisted Research
in Mathematics and Machine Learning},
author = {Max Zimmer and Nico Pelleriti and Christophe Roux and Sebastian Pokutta},
year = {2026},
eprint = {2603.15914},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2603.15914}
}This project is licensed under the MIT License.
The sandboxing provided by this framework is designed to limit the agent's filesystem access, but it comes with no guarantee of security. The authors assume no responsibility for any damage, data loss, or unintended behavior resulting from the use of this software. Use at your own risk.
