pdb2reaction: End-to-end Reaction-Path Modeling from PDB Structures Using Machine-Learning Interatomic Potentials
pdb2reaction is a Python CLI toolkit for turning PDB structures into enzymatic reaction pathways with machine-learning interatomic potentials (MLIPs). Each workflow step is also available as an individual subcommand (opt, scan, scan2d, path-search, tsopt, freq, irc, dft, energy-diagram, etc.) for fine-grained control.
A single command can generate a first-pass enzymatic reaction path:
# Multi-PDB mode (R + P → MEP)
pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3'# Scan mode (single structure → staged bond scans → MEP)
pdb2reaction -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
--scan-lists '[("CS1 SAM 320","GPP 321 C7",1.60)]' \
'[("GPP 321 H11","GLU 186 OE2",0.90)]'The full workflow — MEP search → TS optimization → IRC → thermochemistry → single-point DFT — can be run in one command:
pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
--tsopt --thermo --dftWorking examples are provided in the
examples/directory: arun.shwith completeallworkflow commands for both the multi-structure MEP and the scan-based pipeline.
Given (i) two or more PDB files (R → ... → P), or (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt, pdb2reaction automatically:
- extracts an active-site model around user-defined substrates to build a cluster model,
- explores minimum-energy paths (MEPs) with GSM or DMF,
- optionally optimizes transition states, runs vibrational analysis, IRC, and single-point DFT,
using machine-learning interatomic potentials (MLIPs).
| Tool | Use case | Repository |
|---|---|---|
| mlmm-toolkit | ML/MM (ONIOM) with full protein environment — automates MM parameter generation and ML region assignment from a single PDB input | https://github.com/t-0hmura/mlmm_toolkit |
| UMA–Pysisyphus Interface | YAML-input-based reaction mechanism analysis for small molecules | https://github.com/t-0hmura/uma_pysis |
Both pdb2reaction and mlmm-toolkit include a custom GPU-optimized pysisyphus fork for geometry optimization, TS search, and IRC. This bundled fork is not compatible with the upstream pysisyphus package; do not install them side by side.
Important (prerequisites):
- Input PDB files must already contain hydrogen atoms.
- When providing multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ).
- Boolean CLI options accept both
--flag/--no-flagand value style--flag True/False(yes/no,1/0are also accepted). Prefer toggle style in new scripts.- The workflow also works for small-molecule systems. If you omit
--center/-cand--ligand-charge, you can use.xyzor.gjfinputs as well.
- Getting Started — Quick start and workflow overview
- Installation — Setup and dependency installation
- Examples — Working
allworkflow commands (MEP and scan pipelines) for BezA, inexamples/run.sh - YAML Reference — Configuration options
- JSON Output Reference — Machine-readable result.json schema
- Troubleshooting — Common errors, backend selection guide, VRAM requirements
- Full documentation: t-0hmura.github.io/pdb2reaction/
Linux with a CUDA-capable NVIDIA GPU is the validated production environment for the MLIP reaction-path workflows. The core Python package and CPU-only smoke tests also run on macOS and on Windows under WSL2.
- Python >= 3.11
- CUDA 12.x
pip install torch --index-url https://download.pytorch.org/whl/cu129
pip install pdb2reaction
plotly_get_chrome -y
huggingface-cli loginInstall cyipopt (recommended via conda):
conda install -c conda-forge cyipopt -yFor the full step-by-step guide (HPC module load, alternative backends, DFT extras, troubleshooting), see docs/installation.md.
DFT dependencies are not installed by default. To use pdb2reaction dft, install the [dft] extra:
pip install "pdb2reaction[dft]"This installs PySCF, GPU4PySCF (x86_64 only), and related CUDA libraries. Note that DFT single-point calculations are practical only for systems up to ~300 atoms; larger systems will require prohibitive computational cost.
For detailed installation instructions, see Installation.
| Potential | Repository | Install extra |
|---|---|---|
| UMA (default) | https://github.com/facebookresearch/fairchem | (included) |
| ORB | https://github.com/orbital-materials/orb-models | pip install "pdb2reaction[orb]" |
| MACE | https://github.com/ACEsuit/mace | See below |
| AIMNet2 | https://github.com/isayevlab/aimnetcentral | pip install "pdb2reaction[aimnet]" |
MACE installation: Because
mace-torchandfairchem-core(UMA) can pin incompatible versions ofe3nn, we recommend installing MACE in a dedicated environment. To use MACE, uninstallfairchem-corefirst, then install MACE:pip uninstall fairchem-core pip install mace-torch
The examples below use GPP C6-methyltransferase BezA (Tsutsumi et al., Angew. Chem. Int. Ed. 2022, 61, e202111217) — a two-step mechanism: electrophilic methyl transfer from SAM to GPP C6 (via C7 carbocation), then proton abstraction by glutamate (GLU 186). The complete commands are in examples/run.sh.
pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
--tsopt --thermo --out-dir result_meppdb2reaction -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
--scan-lists '[("CS1 SAM 320","GPP 321 C7",1.60)]' \
'[("GPP 321 H11","GLU 186 OE2",0.90)]' \
--tsopt --thermo --out-dir result_scanpdb2reaction -i TS_candidate.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
--tsopt1. Extract active-site model (cluster model) — extract
pdb2reaction extract -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3'2. Optimize geometry — opt
pdb2reaction opt -i model.pdb -l 'SAM:1,GPP:-3'3. MEP search — path-opt
pdb2reaction path-opt -i R_model.pdb IM_model.pdb -l 'SAM:1,GPP:-3'Recursive MEP search for multi-step reactions — path-search
pdb2reaction path-search -i R_model.pdb P_model.pdb -l 'SAM:1,GPP:-3'4. TS optimization — tsopt
pdb2reaction tsopt -i hei.pdb -l 'SAM:1,GPP:-3'5. Frequency analysis — freq
pdb2reaction freq -i ts_optimized.pdb -l 'SAM:1,GPP:-3'6. IRC — irc
pdb2reaction irc -i ts_optimized.pdb -l 'SAM:1,GPP:-3'7. DFT single-point — dft
pdb2reaction dft -i optimized.pdb -l 'SAM:1,GPP:-3'| Subcommand | Role | Documentation |
|---|---|---|
all |
End-to-end: extraction → MEP → TS → IRC → freq → DFT | docs/all.md |
| Subcommand | Role | Documentation |
|---|---|---|
extract |
Extract active-site model (cluster model) | docs/extract.md |
fix-altloc |
Resolve alternate conformations in PDB files | docs/fix-altloc.md |
add-elem-info |
Add/repair PDB element columns (77–78) | docs/add-elem-info.md |
| Subcommand | Role | Documentation |
|---|---|---|
opt |
Geometry optimization (L-BFGS or RFO) | docs/opt.md |
tsopt |
TS optimization (Dimer or RS-I-RFO) | docs/tsopt.md |
path-opt |
MEP optimization via GSM or DMF | docs/path-opt.md |
path-search |
Recursive MEP search with refinement | docs/path-search.md |
scan |
1D bond-length driven scan | docs/scan.md |
scan2d |
2D distance grid scan | docs/scan2d.md |
scan3d |
3D distance grid scan | docs/scan3d.md |
| Subcommand | Role | Documentation |
|---|---|---|
freq |
Vibrational frequency analysis + thermochemistry | docs/freq.md |
irc |
IRC calculation (EulerPC) | docs/irc.md |
dft |
Single-point DFT (GPU4PySCF / PySCF) | docs/dft.md |
bond-summary |
Compare structures and report bond changes | docs/bond-summary.md |
| Subcommand | Role | Documentation |
|---|---|---|
trj2fig |
Energy plot from XYZ trajectory | docs/trj2fig.md |
energy-diagram |
Energy diagram from numeric values | docs/energy-diagram.md |
Tip: In
tsopt,freq, andirc, setting--hessian-calc-mode Analyticalis strongly recommended when you have enough VRAM.
On HPC clusters or multi-GPU workstations, pdb2reaction can parallelize UMA inference across nodes. Set workers and workers_per_node to enable parallel inference; see docs/hpc-example.md for details.
pdb2reaction --help
pdb2reaction <subcommand> --help
pdb2reaction <subcommand> --help-advanced
pdb2reaction all --help-advanced
# Shorthand alias (equivalent to pdb2reaction)
p2r --help
# Equivalent module invocation
python -m pdb2reaction --helppdb2reaction all --help shows core options. Use pdb2reaction all --help-advanced for the full option list.
scan, scan2d, scan3d, and the calculation commands (opt, path-opt, path-search, tsopt, freq, irc, dft) now follow the same progressive-help pattern (--help core, --help-advanced full). add-elem-info, trj2fig, and energy-diagram also use the same pattern. extract and fix-altloc also support progressive help (--help core, --help-advanced full parser options).
If you encounter any issues, please open an issue at https://github.com/t-0hmura/pdb2reaction/issues.
A preprint describing pdb2reaction is in preparation. Currently, if you find this work helpful for your research, please cite the software itself:
@software{ohmura2026pdb2reaction,
author = {Ohmura, Takuto},
title = {pdb2reaction},
year = {2026},
month = {4},
version = {0.3.6},
url = {https://github.com/t-0hmura/pdb2reaction},
license = {GPL-3.0},
doi = {10.5281/zenodo.19197865}
}pdb2reaction ships AI-agent instructions under .claude/skills/
covering the CLI subcommands, structure I/O (PDB / XYZ / GJF), backend
installation (UMA / Orb / MACE / AIMNet2 / DFT / xtb), common
workflows, output parsing, and HPC operation.
To use them, copy the .claude/skills/ directory into your project
repository or home directory (Claude Code, Cursor, etc.).
- MACE and UMA cannot coexist in the same environment due to an
e3nnversion conflict. Use separate conda environments. - DFT single-point (
pdb2reaction dft) is practical up to ~300 atoms; larger systems may require fragmentation. - ORB backend tends to converge transition states with extra small imaginary modes even when the reaction coordinate is correctly identified (i.e. mechanism recovery is usually fine but a clean single-saddle TS spectrum is not guaranteed). For quantitative studies that need a single-imaginary-mode TS, prefer UMA or MACE, or re-score ORB-converged geometries with DFT.
- CPU-only execution is supported but 10-100x slower than GPU.
pdb2reaction is distributed under the GNU General Public License version 3 (GPL-3.0) and is available for academic and commercial use subject to the GPL-3.0 license terms.
