⚔️ MLIP Arena ⚔️

Fair and transparent benchmark of foundation machine learning interatomic potentials (MLIPs)

🚀 The Future of Atomistic Modeling and Simulation Benchmarks for MLIPs

Foundation machine learning interatomic potentials (MLIPs), trained on extensive databases containing millions of density functional theory (DFT) calculations, have revolutionized molecular and materials modeling. However, existing benchmarks often suffer from data leakage, limited transferability, and an over-reliance on error-based metrics tied to specific DFT references.

MLIP Arena introduces a unified, cutting-edge benchmark platform for evaluating foundation MLIP performance far beyond conventional error metrics. It focuses on revealing the physical soundness learned by MLIPs and assessing their practical utility, remaining completely agnostic to the underlying model architectures and training datasets.

By moving beyond static DFT references and revealing the critical failure modes of current foundation MLIPs in real-world settings, MLIP Arena provides a reproducible framework to guide next-generation MLIP development. We aim to drive improvements in predictive accuracy and runtime efficiency while maintaining robust physical consistency!

⚡ MLIP Arena leverages modern pythonic workflow orchestration with 💙 Prefect 💙 to enable advanced task/flow chaining, scaling, and caching.

Note

Contributions of new tasks via PRs are highly welcome! See our Project Page for outstanding tasks, or propose new feature requests in Discussions.

📖 Official Documentation

For comprehensive guides, API references, and advanced usage, please visit our Official Documentation Site!

📢 Announcements

[Sep 18, 2025] 🎊 MLIP Arena is accepted as a Spotlight (top 3.5%) at NeurIPS! 🎊
[Apr 8, 2025] 🎉 MLIP Arena is accepted as an ICLR AI4Mat Spotlight! 🎉 Huge thanks to all co-authors for their contributions!

🛠️ Installation

Option 1: From PyPI (Prefect workflow only, without pretrained models)

pip install mlip-arena

Option 2: From Source (with Integrated Pretrained Models)

Caution

We strongly recommend a clean build in a new virtual environment due to compatibility issues between multiple popular MLIPs. We provide a single installation script using uv for minimal package conflicts and blazing fast installation!

Important

To automatically download Fairchem model checkpoints, please ensure you have downloading access to their Hugging Face model repo (e.g., OMAT24) (not the dataset repo). You must also log in locally on your machine via hf auth login (see HF Hub authentication).

🐧 Linux

# (Optional) Install uv (it's much faster than pip!)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

git clone https://github.com/atomind-ai/mlip-arena.git
cd mlip-arena

# One-script uv pip installation
bash scripts/install.sh

Tip

Installing all compiled models can consume significant local storage. You can use the pip flag --no-cache, and running uv cache clean is extremely helpful for freeing up space.

🍎 Mac OS

# (Optional) Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

# One-script uv pip installation
bash scripts/install-macosx.sh

⏩ Quickstart

Instructions for individual benchmarks are provided in the README within each corresponding folder under /benchmarks.

For a complete benchmark sweep using HPC resources, see the benchmarks/submit.py script. Refer to the Run Benchmarks and Submit Model section for usage instructions.

⚙️ Workflow Overview

✅ The first Prefect task: Molecular Dynamics

Arena provides a unified interface to run all compiled MLIPs. This can be achieved by simply iterating over MLIPEnum:

from mlip_arena.models import MLIPEnum
from mlip_arena.tasks import MD
from mlip_arena.tasks.utils import get_calculator

from ase import units
from ase.build import bulk

atoms = bulk("Cu", "fcc", a=3.6) * (5, 5, 5)

results = []

for model in MLIPEnum:
    result = MD(
        atoms=atoms,
        calculator=get_calculator(
            model,
            calculator_kwargs=dict(), # directly passing to the calculator
            dispersion=True,
            dispersion_kwargs=dict(
                damping='bj', xc='pbe', cutoff=40.0 * units.Bohr
            ), # passing to TorchDFTD3Calculator
        ), # compatible with custom ASE Calculators
        ensemble="nve", # nvt and npt are also available
        dynamics="velocityverlet", # compatible with any ASE Dynamics objects and their class names
        total_time=1e3, # 1 ps = 1e3 fs
        time_step=2, # fs
    )
    results.append(result)

🚀 Parallelize Benchmarks at Scale

To run multiple benchmarks in parallel, append .submit to the task function and wrap your tasks in a flow. This dispatches them to a local or remote worker for concurrent execution. See the Prefect documentation on tasks and flows for more details.

from prefect import flow

@flow
def run_all_tasks():
    futures = []
    for model in MLIPEnum:
        future = MD.submit(
            atoms=atoms,
            ...
        )
        futures.append(future)

    return [f.result(raise_on_failure=False) for f in futures]

For a more practical example using HPC resources, please refer to the submission script or our MD stability benchmark.

🧰 List of Modular Tasks

The implemented tasks are available under mlip_arena.tasks.<module>.run or via from mlip_arena.tasks import * for convenient imports (note: this currently requires phonopy to be installed).

OPT: Structure optimization
EOS: Equation of state (energy-volume scan)
MD: Molecular dynamics with flexible dynamics (NVE, NVT, NPT) and temperature/pressure scheduling (annealing, shearing, etc.)
PHONON: Phonon calculation driven by phonopy
NEB: Nudged elastic band
NEB_FROM_ENDPOINTS: Nudged elastic band with convenient image interpolation (linear or IDPP)
ELASTICITY: Elastic tensor calculation

🤝 Contribute and Development

PRs are welcome! Please clone the repo and submit PRs with your changes.

To make changes to the Hugging Face Space, fetch large files from git LFS first, and then run Streamlit:

git lfs fetch --all
git lfs pull
streamlit run serve/app.py

➕ Add New MLIP Models

If you have pretrained MLIP models that you would like to contribute to MLIP Arena and evaluate in real-time benchmarks, you have two options:

External ASE Calculator (Easy / Fast)

Implement a new ASE Calculator class in mlip_arena/models/externals.
Name your class with your awesome model name and add the exact same name to the registry with your metadata.

Caution

Remove unnecessary outputs from the results class attributes to avoid errors during MD simulations. Please refer to CHGNet as an example.

Hugging Face Model (Recommended / High Impact)

Inherit the Hugging Face ModelHubMixin class in your model class definition. We recommend PytorchModelHubMixin.
Create a new Hugging Face Model repository and upload the model file using the push_to_hub function.
Follow the template to code the I/O interface for your model here.
Update the model registry with the necessary metadata.

🏃‍♂️ Run Benchmarks and Submit Model

Once your model is ready (either registered or initialized as a custom ASE Calculator), you can run the core benchmark suite on a SLURM cluster:

Move into the benchmarks/ directory:
```
cd benchmarks
```
Open and modify the submit.py template script. Under the USER CONFIGURATION section:
- Provide your MODEL (as a registered string or custom ASE Calculator instance).
- Adjust the SLURM_CONFIG parameters for your specific HPC allocation (including any conda environments or module loads in the job_script_prologue).
Submit the pipeline:
```
python submit.py
```
This will dynamically distribute and run the core benchmarks (diatomics, EOS bulk, and E-V scans) via a Dask-Jobqueue on your SLURM cluster.

➕ Add New Benchmark

Note

Please reuse, extend, or chain the general tasks defined above and add your new folder and scripts under /benchmarks.

📜 Citation

If you find this work and platform useful, please consider citing the following:

@inproceedings{
    chiang2025mlip,
    title={{MLIP} Arena: Advancing Fairness and Transparency in Machine Learning Interatomic Potentials via an Open, Accessible Benchmark Platform},
    author={Yuan Chiang and Tobias Kreiman and Christine Zhang and Matthew C. Kuner and Elizabeth Jin Weaver and Ishan Amin and Hyunsoo Park and Yunsung Lim and Jihan Kim and Daryl Chrzan and Aron Walsh and Samuel M Blau and Mark Asta and Aditi S. Krishnapriyan},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year={2025},
    url={https://openreview.net/forum?id=SAT0KPA5UO}
}

Name		Name	Last commit message	Last commit date
Latest commit History 239 Commits
.devcontainer		.devcontainer
.github		.github
.streamlit		.streamlit
benchmarks		benchmarks
docs		docs
mlip_arena		mlip_arena
scripts		scripts
serve		serve
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚔️ MLIP Arena ⚔️

📖 Official Documentation

📢 Announcements

🛠️ Installation

Option 1: From PyPI (Prefect workflow only, without pretrained models)

Option 2: From Source (with Integrated Pretrained Models)

⏩ Quickstart

⚙️ Workflow Overview

✅ The first Prefect task: Molecular Dynamics

🚀 Parallelize Benchmarks at Scale

🧰 List of Modular Tasks

🤝 Contribute and Development

➕ Add New MLIP Models

External ASE Calculator (Easy / Fast)

Hugging Face Model (Recommended / High Impact)

🏃‍♂️ Run Benchmarks and Submit Model

➕ Add New Benchmark

📜 Citation

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚔️ MLIP Arena ⚔️

📖 Official Documentation

📢 Announcements

🛠️ Installation

Option 1: From PyPI (Prefect workflow only, without pretrained models)

Option 2: From Source (with Integrated Pretrained Models)

⏩ Quickstart

⚙️ Workflow Overview

✅ The first Prefect task: Molecular Dynamics

🚀 Parallelize Benchmarks at Scale

🧰 List of Modular Tasks

🤝 Contribute and Development

➕ Add New MLIP Models

External ASE Calculator (Easy / Fast)

Hugging Face Model (Recommended / High Impact)

🏃‍♂️ Run Benchmarks and Submit Model

➕ Add New Benchmark

📜 Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages