Skip to content

atomind-ai/mlip-arena

Repository files navigation

βš”οΈ MLIP Arena βš”οΈ

Fair and transparent benchmark of foundation machine learning interatomic potentials (MLIPs)

Hugging Face Documentation Static Badge Static Badge
GitHub Actions Workflow Status Codecov PyPI - Version PyPI - Downloads DOI

Thumbnail

πŸš€ The Future of Atomistic Modeling and Simulation Benchmarks for MLIPs

Foundation machine learning interatomic potentials (MLIPs), trained on extensive databases containing millions of density functional theory (DFT) calculations, have revolutionized molecular and materials modeling. However, existing benchmarks often suffer from data leakage, limited transferability, and an over-reliance on error-based metrics tied to specific DFT references.

MLIP Arena introduces a unified, cutting-edge benchmark platform for evaluating foundation MLIP performance far beyond conventional error metrics. It focuses on revealing the physical soundness learned by MLIPs and assessing their practical utility, remaining completely agnostic to the underlying model architectures and training datasets.

By moving beyond static DFT references and revealing the critical failure modes of current foundation MLIPs in real-world settings, MLIP Arena provides a reproducible framework to guide next-generation MLIP development. We aim to drive improvements in predictive accuracy and runtime efficiency while maintaining robust physical consistency!

⚑ MLIP Arena leverages modern pythonic workflow orchestration with πŸ’™ Prefect πŸ’™ to enable advanced task/flow chaining, scaling, and caching.

Prefect

Note

Contributions of new tasks via PRs are highly welcome! See our Project Page for outstanding tasks, or propose new feature requests in Discussions.


πŸ“– Official Documentation

For comprehensive guides, API references, and advanced usage, please visit our Official Documentation Site!


πŸ“’ Announcements


πŸ› οΈ Installation

Option 1: From PyPI (Prefect workflow only, without pretrained models)

pip install mlip-arena

Option 2: From Source (with Integrated Pretrained Models)

Caution

We strongly recommend a clean build in a new virtual environment due to compatibility issues between multiple popular MLIPs. We provide a single installation script using uv for minimal package conflicts and blazing fast installation!

Important

To automatically download Fairchem model checkpoints, please ensure you have downloading access to their Hugging Face model repo (e.g., OMAT24) (not the dataset repo). You must also log in locally on your machine via hf auth login (see HF Hub authentication).

🐧 Linux

# (Optional) Install uv (it's much faster than pip!)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

git clone https://github.com/atomind-ai/mlip-arena.git
cd mlip-arena

# One-script uv pip installation
bash scripts/install.sh

Tip

Installing all compiled models can consume significant local storage. You can use the pip flag --no-cache, and running uv cache clean is extremely helpful for freeing up space.

🍎 Mac OS

# (Optional) Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

# One-script uv pip installation
bash scripts/install-macosx.sh

⏩ Quickstart

Instructions for individual benchmarks are provided in the README within each corresponding folder under /benchmarks.

For a complete benchmark sweep using HPC resources, see the benchmarks/submit.py script. Refer to the Run Benchmarks and Submit Model section for usage instructions.


βš™οΈ Workflow Overview

βœ… The first Prefect task: Molecular Dynamics

Arena provides a unified interface to run all compiled MLIPs. This can be achieved by simply iterating over MLIPEnum:

from mlip_arena.models import MLIPEnum
from mlip_arena.tasks import MD
from mlip_arena.tasks.utils import get_calculator

from ase import units
from ase.build import bulk

atoms = bulk("Cu", "fcc", a=3.6) * (5, 5, 5)

results = []

for model in MLIPEnum:
    result = MD(
        atoms=atoms,
        calculator=get_calculator(
            model,
            calculator_kwargs=dict(), # directly passing to the calculator
            dispersion=True,
            dispersion_kwargs=dict(
                damping='bj', xc='pbe', cutoff=40.0 * units.Bohr
            ), # passing to TorchDFTD3Calculator
        ), # compatible with custom ASE Calculators
        ensemble="nve", # nvt and npt are also available
        dynamics="velocityverlet", # compatible with any ASE Dynamics objects and their class names
        total_time=1e3, # 1 ps = 1e3 fs
        time_step=2, # fs
    )
    results.append(result)

πŸš€ Parallelize Benchmarks at Scale

To run multiple benchmarks in parallel, append .submit to the task function and wrap your tasks in a flow. This dispatches them to a local or remote worker for concurrent execution. See the Prefect documentation on tasks and flows for more details.

from prefect import flow

@flow
def run_all_tasks():
    futures = []
    for model in MLIPEnum:
        future = MD.submit(
            atoms=atoms,
            ...
        )
        futures.append(future)

    return [f.result(raise_on_failure=False) for f in futures]

For a more practical example using HPC resources, please refer to the submission script or our MD stability benchmark.

🧰 List of Modular Tasks

The implemented tasks are available under mlip_arena.tasks.<module>.run or via from mlip_arena.tasks import * for convenient imports (note: this currently requires phonopy to be installed).

  • OPT: Structure optimization
  • EOS: Equation of state (energy-volume scan)
  • MD: Molecular dynamics with flexible dynamics (NVE, NVT, NPT) and temperature/pressure scheduling (annealing, shearing, etc.)
  • PHONON: Phonon calculation driven by phonopy
  • NEB: Nudged elastic band
  • NEB_FROM_ENDPOINTS: Nudged elastic band with convenient image interpolation (linear or IDPP)
  • ELASTICITY: Elastic tensor calculation

🀝 Contribute and Development

PRs are welcome! Please clone the repo and submit PRs with your changes.

To make changes to the Hugging Face Space, fetch large files from git LFS first, and then run Streamlit:

git lfs fetch --all
git lfs pull
streamlit run serve/app.py

βž• Add New MLIP Models

If you have pretrained MLIP models that you would like to contribute to MLIP Arena and evaluate in real-time benchmarks, you have two options:

External ASE Calculator (Easy / Fast)

  1. Implement a new ASE Calculator class in mlip_arena/models/externals.
  2. Name your class with your awesome model name and add the exact same name to the registry with your metadata.

Caution

Remove unnecessary outputs from the results class attributes to avoid errors during MD simulations. Please refer to CHGNet as an example.

Hugging Face Model (Recommended / High Impact)

  1. Inherit the Hugging Face ModelHubMixin class in your model class definition. We recommend PytorchModelHubMixin.
  2. Create a new Hugging Face Model repository and upload the model file using the push_to_hub function.
  3. Follow the template to code the I/O interface for your model here.
  4. Update the model registry with the necessary metadata.

πŸƒβ€β™‚οΈ Run Benchmarks and Submit Model

Once your model is ready (either registered or initialized as a custom ASE Calculator), you can run the core benchmark suite on a SLURM cluster:

  1. Move into the benchmarks/ directory:
    cd benchmarks
  2. Open and modify the submit.py template script. Under the USER CONFIGURATION section:
    • Provide your MODEL (as a registered string or custom ASE Calculator instance).
    • Adjust the SLURM_CONFIG parameters for your specific HPC allocation (including any conda environments or module loads in the job_script_prologue).
  3. Submit the pipeline:
    python submit.py
    This will dynamically distribute and run the core benchmarks (diatomics, EOS bulk, and E-V scans) via a Dask-Jobqueue on your SLURM cluster.

βž• Add New Benchmark

Note

Please reuse, extend, or chain the general tasks defined above and add your new folder and scripts under /benchmarks.


πŸ“œ Citation

If you find this work and platform useful, please consider citing the following:

@inproceedings{
    chiang2025mlip,
    title={{MLIP} Arena: Advancing Fairness and Transparency in Machine Learning Interatomic Potentials via an Open, Accessible Benchmark Platform},
    author={Yuan Chiang and Tobias Kreiman and Christine Zhang and Matthew C. Kuner and Elizabeth Jin Weaver and Ishan Amin and Hyunsoo Park and Yunsung Lim and Jihan Kim and Daryl Chrzan and Aron Walsh and Samuel M Blau and Mark Asta and Aditi S. Krishnapriyan},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year={2025},
    url={https://openreview.net/forum?id=SAT0KPA5UO}
}