Adding cleavage benchmarks by vue1999 · Pull Request #404 · ddmms/ml-peg

vue1999 · 2026-03-04T02:48:25Z

Pre-review checklist for PR author

PR author must check the checkboxes below when creating the PR.

I've confirmed the contribution guidelines.

Summary

Adding a benchmark to evaluate the accuracy of predicting cleavage energies of crystalline surfaces.

Linked issue

Resolves #403

Progress

Calculations
Analysis
Application
Documentation

Testing

MACE omat, ORB v3

New decorators/callbacks

no

joehart2001 · 2026-03-20T15:53:43Z

docs/source/user_guide/benchmarks/surfaces.rst

+Computational cost
+------------------
+
+Medium: benchmark involves only single-point calculations, but for 36,718 slab-bulk pairs.


it would be good to give a rough indication of time e.g. hours on gpu and minutes on gpu?

joehart2001 · 2026-03-22T22:38:27Z

ml_peg/calcs/surfaces/cleavage_energy/calc_cleavage_energy.py

+from pathlib import Path
+from typing import Any
+
+from ase.io import read


Suggested change

from ase.io import read

from ase.io import read, write

from tqdm import tqdm

joehart2001 · 2026-03-22T22:38:48Z

ml_peg/calcs/surfaces/cleavage_energy/calc_cleavage_energy.py

+        / "cleavage_energy"
+    )
+
+    results = {}


Suggested change

results = {}

write_dir = OUT_PATH / model_name

write_dir.mkdir(parents=True, exist_ok=True)

joehart2001 · 2026-03-22T22:39:14Z

ml_peg/calcs/surfaces/cleavage_energy/calc_cleavage_energy.py

+
+    results = {}
+
+    for mpid_dir in sorted(d for d in data_dir.iterdir() if d.is_dir()):


Suggested change

for mpid_dir in sorted(d for d in data_dir.iterdir() if d.is_dir()):

idx = 0

for mpid_dir in tqdm(sorted(d for d in data_dir.iterdir() if d.is_dir())):

joehart2001 · 2026-03-22T22:39:39Z

ml_peg/calcs/surfaces/cleavage_energy/calc_cleavage_energy.py

+            unique_id = slab.info["unique_id"]
+            results[unique_id] = {
+                "slab_energy": slab_energy,
+                "bulk_energy": bulk_energy,
+                "area_slab": float(slab.info["area_slab"]),
+                "thickness_ratio": float(slab.info["thickness_ratio"]),
+                "ref_cleavage_energy": float(slab.info["ref_cleavage_energy"]),
+                "mpid": slab.info["mpid"],
+                "miller": slab.info["miller"],
+                "term": int(slab.info["term"]),
+            }
+
+    OUT_PATH.mkdir(parents=True, exist_ok=True)
+    output_file = OUT_PATH / f"{model_name}.json"
+    with open(output_file, "w", encoding="utf-8") as f:
+        json.dump(results, f)


Suggested change

unique_id = slab.info["unique_id"]

results[unique_id] = {

"slab_energy": slab_energy,

"bulk_energy": bulk_energy,

"area_slab": float(slab.info["area_slab"]),

"thickness_ratio": float(slab.info["thickness_ratio"]),

"ref_cleavage_energy": float(slab.info["ref_cleavage_energy"]),

"mpid": slab.info["mpid"],

"miller": slab.info["miller"],

"term": int(slab.info["term"]),

}

OUT_PATH.mkdir(parents=True, exist_ok=True)

output_file = OUT_PATH / f"{model_name}.json"

with open(output_file, "w", encoding="utf-8") as f:

json.dump(results, f)

slab.info.update(

{

"slab_energy": slab_energy,

"bulk_energy": bulk_energy,

"area_slab": float(slab.info["area_slab"]),

"thickness_ratio": float(slab.info["thickness_ratio"]),

"ref_cleavage_energy": float(slab.info["ref_cleavage_energy"]),

"mpid": slab.info["mpid"],

"miller": slab.info["miller"],

"term": int(slab.info["term"]),

}

)

write(write_dir / f"{idx}.xyz", slab, format="extxyz")

idx += 1

joehart2001 · 2026-03-22T22:43:06Z

Hey @vue1999, thanks for the PR and its looking super good. A few things:

ive made some code suggestions to the calc script so we save files in the generalised format. I know there are 30,000 files... but what do you think @ElliottKasoar?
ive made some knock on suggestions to the analysis based on these calc changes
ive also made suggestions which implement the density scatter plot + structure visualisation, as theres a lot of structures
We are also going to in the future add the ability to swithc betwen types of errors e.g. mae and rmse, so ive suggested we just keep mae for now unless you're against this?

Let me know if you've got any ideas or are unsure about any of my suggestions, thanks!

joehart2001 · 2026-03-22T22:50:33Z

ml_peg/analysis/surfaces/cleavage_energy/metrics.yml

+  RMSE:
+    good: 0.0
+    bad: 10.0
+    unit: meV/A^2
+    tooltip: Root Mean Squared Error of cleavage energies
+    level_of_theory: PBE


Suggested change

RMSE:

good: 0.0

bad: 10.0

unit: meV/A^2

tooltip: Root Mean Squared Error of cleavage energies

level_of_theory: PBE

joehart2001 · 2026-03-22T22:51:07Z