Description
A viable CUDA installation was not found that is compatible with the current code. The dependencies in the setup.py did not resolve by default and do not match the README or other example files. Fixing these dependencies led to multiple possible configurations that resolved dependencies but led to the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
This came up from multiple locations in the util.py module depending on the method call. These setups were tested on a Google Colab T4 node (CUDA 12.4) and the UW Hyak Klone supercomputing cluster (CUDA 11.8 and 12.4).
To Reproduce
Method 1: Using pip
pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
pip install alignn --no-deps
Method 2: Using pip + github
git clone https://github.com/usnistgov/alignn.git
cd alignn
git checkout develop
pip install -dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
pip install -e . --no-deps
Method 3: Using Conda
conda env create -f environment.yaml
environment.yaml v1 (CUDA 11.8):
name: alignn
channels:
- dglteam/label/cu118
- nvidia
- pytorch
dependencies:
- python=3.10
- pytorch-cuda=11.8
- pytorch=2.1.2
- torchdata=0.7
- torchvision
- torchaudio
- dgl=2.1.0
- alignn
environment.yaml v2 (CUDA 12.4):
name: alignn
channels:
- dglteam/label/th24_cu124
- nvidia
- pytorch
dependencies:
- python=3.10
- pytorch-cuda=12.4
- pytorch=2.4.0
- torchvision
- torchaudio
- torchdata=0.7
- dgl
- alignn
Expected behavior
The following code runs without error, where the file POSCAR is a vasp input file in the current directory:
from alignn.ff.ff import AlignnAtomwiseCalculator, default_path
from ase.io import read
calc = AlignnAtomwiseCalculator(path=default_path())
atoms = read("POSCAR", format='vasp')
atoms.calc = calc
An runtime error is thrown when running the following line:
energy = atoms.get_potential_energy()
Setups Tested:
- OS: Linux (Ubuntu 20.04, Rock Linux 8)
- CUDA Versions: 11.8, 12.4
- Versions: 2024.12.12, 2025.4.1, current development release
Additional context
- Attempted adding
import torch; torch.set_default_device('cuda'), led to TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
- Modifying the
util.py module to convert all tensors to CPU (the .cpu() method) solved some errors but led to similar issues down the line.
- All installation methods work with CPU-only
Description
A viable CUDA installation was not found that is compatible with the current code. The dependencies in the setup.py did not resolve by default and do not match the README or other example files. Fixing these dependencies led to multiple possible configurations that resolved dependencies but led to the following error:
This came up from multiple locations in the
util.pymodule depending on the method call. These setups were tested on a Google Colab T4 node (CUDA 12.4) and the UW Hyak Klone supercomputing cluster (CUDA 11.8 and 12.4).To Reproduce
Method 1: Using pip
pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.htmlpip install alignn --no-depsMethod 2: Using pip + github
git clone https://github.com/usnistgov/alignn.gitcd alignngit checkout developpip install -dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.htmlpip install -e . --no-depsMethod 3: Using Conda
conda env create -f environment.yamlenvironment.yaml v1 (CUDA 11.8):
name: alignnchannels:- dglteam/label/cu118- nvidia- pytorchdependencies:- python=3.10- pytorch-cuda=11.8- pytorch=2.1.2- torchdata=0.7- torchvision- torchaudio- dgl=2.1.0- alignnenvironment.yaml v2 (CUDA 12.4):
name: alignnchannels:- dglteam/label/th24_cu124- nvidia- pytorchdependencies:- python=3.10- pytorch-cuda=12.4- pytorch=2.4.0- torchvision- torchaudio- torchdata=0.7- dgl- alignnExpected behavior
The following code runs without error, where the file
POSCARis a vasp input file in the current directory:from alignn.ff.ff import AlignnAtomwiseCalculator, default_pathfrom ase.io import readcalc = AlignnAtomwiseCalculator(path=default_path())atoms = read("POSCAR", format='vasp')atoms.calc = calcAn runtime error is thrown when running the following line:
energy = atoms.get_potential_energy()Setups Tested:
Additional context
import torch; torch.set_default_device('cuda'), led toTypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.util.pymodule to convert all tensors to CPU (the.cpu()method) solved some errors but led to similar issues down the line.