Conventional graphs only model the pairwise connectivity in molecules, failing to adequately represent higher-order connections like multi-center bonds and conjugated structures. To tackle this challenge, we introduce molecular hypergraphs and propose Molecular Hypergraph Neural Networks (MHNN) to predict molecular optoelectronic properties, where hyperedges represent conjugated structures.
- We'll use
condato install dependencies and set up the environment. We recommend using the Python 3.9 Miniconda installer. - After installing
conda, installmambato the base environment.mambais a faster, drop-in replacement forconda:conda install mamba -n base -c conda-forge
- Create a new environment named
mhnnand install dependencies.mamba env create -f env.yml
- Activate the conda environment with
conda activate mhnn.
| Dataset | Graphs | Task type | Task number | Metric |
|---|---|---|---|---|
| OPV | 90,823 | regression | 8 | MAE |
| OCELOTv1 | 25,251 | regression | 15 | MAE |
| PCQM4Mv2 | 3,746,620 | regression | 1 | MAE |
The OPV dataset, named organic photovoltaic dataset, contains 90,823 unique molecules (monomers and soluble small molecules) and their SMILES strings, 3D geometries, and optoelectronic properties from DFT calculations. OPV has four molecular tasks, the energy of highest occupied molecular orbital for the monomer (
The OCELOTv1 dataset contains 25,251 organic
PCQM4Mv2 is a quantum chemistry dataset originally curated under the PubChemQC project. A meaningful ML task was defined to predict DFT-calculated HOMO-LUMO energy gap of molecules given their 2D molecular graphs. PCQM4Mv2 is unprecedentedly large (> 3.8M graphs) in scale comparing to other labeled graph-level prediction datasets.
-
We provide training scripts for
MHNNandbaselinesunderscripts/opv. For example, we can trainMHNNfor one task by running:bash scripts/opv/mhnn.sh [TASK_ID]
-
Train a model for all tasks by running:
bash scripts/opv/run_all_tasks.sh [MODEL_NAME]
-
The OPV dataset will be downloaded automatically at the first time of training.
-
The model names and task ID for different tasks can be found here.
-
We provide training scripts for
MHNNunderscripts/ocelot. For example, we can trainMHNNfor one task by running:bash scripts/ocelot/train.sh [TASK_ID]
-
Train
MHNNfor all tasks by running:bash scripts/ocelot/run_all_tasks.sh
-
The ocelot dataset will be downloaded automatically at the first time of training.
-
Task ID for different tasks can be found here.
-
We provide a training script for
MHNNunderscripts/pcqm4mv2to trainMHNNby running:bash scripts/pcqm4mv2/train.sh
-
The PCQM4Mv2 dataset will be downloaded automatically at the first time of training.
This work was supported as part of NCCR Catalysis (grant number 180544), a National Centre of Competence in Research funded by the Swiss National Science Foundation.
If you find our work useful, please consider citing it:
@article{chen2024molecular,
author = {Chen, Junwu and Schwaller, Philippe},
title = "{Molecular hypergraph neural networks}",
journal = {The Journal of Chemical Physics},
volume = {160},
number = {14},
pages = {144307},
year = {2024},
doi = {10.1063/5.0193557},
url = {https://doi.org/10.1063/5.0193557},
}
If you have any question, welcome to contact me at:
Junwu Chen: junwu.chen@epfl.ch
