Skip to content

t-0hmura/pdb2reaction

Repository files navigation

pdb2reaction: End-to-end Reaction-Path Modeling from PDB Structures Using Machine-Learning Interatomic Potentials

Overview

pdb2reaction workflow overview

pdb2reaction is a Python CLI toolkit for turning PDB structures into enzymatic reaction pathways with machine-learning interatomic potentials (MLIPs). Each workflow step is also available as an individual subcommand (opt, scan, scan2d, path-search, tsopt, freq, irc, dft, energy-diagram, etc.) for fine-grained control.

A single command can generate a first-pass enzymatic reaction path:

# Multi-PDB mode (R + P → MEP)
pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3'
# Scan mode (single structure → staged bond scans → MEP)
pdb2reaction -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --scan-lists '[("CS1 SAM 320","GPP 321 C7",1.60)]' \
                 '[("GPP 321 H11","GLU 186 OE2",0.90)]'

The full workflow — MEP search → TS optimization → IRC → thermochemistry → single-point DFT — can be run in one command:

pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --tsopt --thermo --dft

Working examples are provided in the examples/ directory: a run.sh with complete all workflow commands for both the multi-structure MEP and the scan-based pipeline.


Given (i) two or more PDB files (R → ... → P), or (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt, pdb2reaction automatically:

  • extracts an active-site model around user-defined substrates to build a cluster model,
  • explores minimum-energy paths (MEPs) with GSM or DMF,
  • optionally optimizes transition states, runs vibrational analysis, IRC, and single-point DFT,

using machine-learning interatomic potentials (MLIPs).

Related tools

Tool Use case Repository
mlmm-toolkit ML/MM (ONIOM) with full protein environment — automates MM parameter generation and ML region assignment from a single PDB input https://github.com/t-0hmura/mlmm_toolkit
UMA–Pysisyphus Interface YAML-input-based reaction mechanism analysis for small molecules https://github.com/t-0hmura/uma_pysis

Both pdb2reaction and mlmm-toolkit include a custom GPU-optimized pysisyphus fork for geometry optimization, TS search, and IRC. This bundled fork is not compatible with the upstream pysisyphus package; do not install them side by side.

Important (prerequisites):

  • Input PDB files must already contain hydrogen atoms.
  • When providing multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ).
  • Boolean CLI options accept both --flag / --no-flag and value style --flag True/False (yes/no, 1/0 are also accepted). Prefer toggle style in new scripts.
  • The workflow also works for small-molecule systems. If you omit --center/-c and --ligand-charge, you can use .xyz or .gjf inputs as well.

Documentation


Installation

Linux with a CUDA-capable NVIDIA GPU is the validated production environment for the MLIP reaction-path workflows. The core Python package and CPU-only smoke tests also run on macOS and on Windows under WSL2.

Prerequisites

  • Python >= 3.11
  • CUDA 12.x

Minimal setup (CUDA 12.9)

pip install torch --index-url https://download.pytorch.org/whl/cu129
pip install pdb2reaction
plotly_get_chrome -y
huggingface-cli login

For DMF method (Additional MEP search method)

Install cyipopt (recommended via conda):

conda install -c conda-forge cyipopt -y

For the full step-by-step guide (HPC module load, alternative backends, DFT extras, troubleshooting), see docs/installation.md.

DFT single-point (pdb2reaction dft)

DFT dependencies are not installed by default. To use pdb2reaction dft, install the [dft] extra:

pip install "pdb2reaction[dft]"

This installs PySCF, GPU4PySCF (x86_64 only), and related CUDA libraries. Note that DFT single-point calculations are practical only for systems up to ~300 atoms; larger systems will require prohibitive computational cost.

For detailed installation instructions, see Installation.

Supported ML potentials

Potential Repository Install extra
UMA (default) https://github.com/facebookresearch/fairchem (included)
ORB https://github.com/orbital-materials/orb-models pip install "pdb2reaction[orb]"
MACE https://github.com/ACEsuit/mace See below
AIMNet2 https://github.com/isayevlab/aimnetcentral pip install "pdb2reaction[aimnet]"

MACE installation: Because mace-torch and fairchem-core (UMA) can pin incompatible versions of e3nn, we recommend installing MACE in a dedicated environment. To use MACE, uninstall fairchem-core first, then install MACE:

pip uninstall fairchem-core
pip install mace-torch

Quick Examples

The examples below use GPP C6-methyltransferase BezA (Tsutsumi et al., Angew. Chem. Int. Ed. 2022, 61, e202111217) — a two-step mechanism: electrophilic methyl transfer from SAM to GPP C6 (via C7 carbocation), then proton abstraction by glutamate (GLU 186). The complete commands are in examples/run.sh.

Full workflow (multi-structure MEP)

pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --tsopt --thermo --out-dir result_mep

Scan mode (single structure → staged bond scans → MEP)

pdb2reaction -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --scan-lists '[("CS1 SAM 320","GPP 321 C7",1.60)]' \
                 '[("GPP 321 H11","GLU 186 OE2",0.90)]' \
    --tsopt --thermo --out-dir result_scan

TS optimization only

pdb2reaction -i TS_candidate.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --tsopt

Step-by-step workflow

1. Extract active-site model (cluster model)extract

pdb2reaction extract -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3'

2. Optimize geometryopt

pdb2reaction opt -i model.pdb -l 'SAM:1,GPP:-3'

3. MEP searchpath-opt

pdb2reaction path-opt -i R_model.pdb IM_model.pdb -l 'SAM:1,GPP:-3'

Recursive MEP search for multi-step reactionspath-search

pdb2reaction path-search -i R_model.pdb P_model.pdb -l 'SAM:1,GPP:-3'

4. TS optimizationtsopt

pdb2reaction tsopt -i hei.pdb -l 'SAM:1,GPP:-3'

5. Frequency analysisfreq

pdb2reaction freq -i ts_optimized.pdb -l 'SAM:1,GPP:-3'

6. IRCirc

pdb2reaction irc -i ts_optimized.pdb -l 'SAM:1,GPP:-3'

7. DFT single-pointdft

pdb2reaction dft -i optimized.pdb -l 'SAM:1,GPP:-3'

CLI Subcommands

Workflow

Subcommand Role Documentation
all End-to-end: extraction → MEP → TS → IRC → freq → DFT docs/all.md

Structure Preparation

Subcommand Role Documentation
extract Extract active-site model (cluster model) docs/extract.md
fix-altloc Resolve alternate conformations in PDB files docs/fix-altloc.md
add-elem-info Add/repair PDB element columns (77–78) docs/add-elem-info.md

Optimization & Path Search

Subcommand Role Documentation
opt Geometry optimization (L-BFGS or RFO) docs/opt.md
tsopt TS optimization (Dimer or RS-I-RFO) docs/tsopt.md
path-opt MEP optimization via GSM or DMF docs/path-opt.md
path-search Recursive MEP search with refinement docs/path-search.md
scan 1D bond-length driven scan docs/scan.md
scan2d 2D distance grid scan docs/scan2d.md
scan3d 3D distance grid scan docs/scan3d.md

Analysis

Subcommand Role Documentation
freq Vibrational frequency analysis + thermochemistry docs/freq.md
irc IRC calculation (EulerPC) docs/irc.md
dft Single-point DFT (GPU4PySCF / PySCF) docs/dft.md
bond-summary Compare structures and report bond changes docs/bond-summary.md

Visualization

Subcommand Role Documentation
trj2fig Energy plot from XYZ trajectory docs/trj2fig.md
energy-diagram Energy diagram from numeric values docs/energy-diagram.md

Tip: In tsopt, freq, and irc, setting --hessian-calc-mode Analytical is strongly recommended when you have enough VRAM.


HPC / Multi-GPU

On HPC clusters or multi-GPU workstations, pdb2reaction can parallelize UMA inference across nodes. Set workers and workers_per_node to enable parallel inference; see docs/hpc-example.md for details.


Getting Help

pdb2reaction --help
pdb2reaction <subcommand> --help
pdb2reaction <subcommand> --help-advanced
pdb2reaction all --help-advanced
# Shorthand alias (equivalent to pdb2reaction)
p2r --help
# Equivalent module invocation
python -m pdb2reaction --help

pdb2reaction all --help shows core options. Use pdb2reaction all --help-advanced for the full option list. scan, scan2d, scan3d, and the calculation commands (opt, path-opt, path-search, tsopt, freq, irc, dft) now follow the same progressive-help pattern (--help core, --help-advanced full). add-elem-info, trj2fig, and energy-diagram also use the same pattern. extract and fix-altloc also support progressive help (--help core, --help-advanced full parser options).

If you encounter any issues, please open an issue at https://github.com/t-0hmura/pdb2reaction/issues.


Citation

A preprint describing pdb2reaction is in preparation. Currently, if you find this work helpful for your research, please cite the software itself:

@software{ohmura2026pdb2reaction,
  author       = {Ohmura, Takuto},
  title        = {pdb2reaction},
  year         = {2026},
  month        = {4},
  version      = {0.3.6},
  url          = {https://github.com/t-0hmura/pdb2reaction},
  license      = {GPL-3.0},
  doi          = {10.5281/zenodo.19197865}
}

Agent Skills

pdb2reaction ships AI-agent instructions under .claude/skills/ covering the CLI subcommands, structure I/O (PDB / XYZ / GJF), backend installation (UMA / Orb / MACE / AIMNet2 / DFT / xtb), common workflows, output parsing, and HPC operation.

To use them, copy the .claude/skills/ directory into your project repository or home directory (Claude Code, Cursor, etc.).


Known limitations

  • MACE and UMA cannot coexist in the same environment due to an e3nn version conflict. Use separate conda environments.
  • DFT single-point (pdb2reaction dft) is practical up to ~300 atoms; larger systems may require fragmentation.
  • ORB backend tends to converge transition states with extra small imaginary modes even when the reaction coordinate is correctly identified (i.e. mechanism recovery is usually fine but a clean single-saddle TS spectrum is not guaranteed). For quantitative studies that need a single-imaginary-mode TS, prefer UMA or MACE, or re-score ORB-converged geometries with DFT.
  • CPU-only execution is supported but 10-100x slower than GPU.

License

pdb2reaction is distributed under the GNU General Public License version 3 (GPL-3.0) and is available for academic and commercial use subject to the GPL-3.0 license terms.