Skip to content

CUDA Installation Dependency Issues #2

@wliverno

Description

@wliverno

Description
A viable CUDA installation was not found that is compatible with the current code. The dependencies in the setup.py did not resolve by default and do not match the README or other example files. Fixing these dependencies led to multiple possible configurations that resolved dependencies but led to the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

This came up from multiple locations in the util.py module depending on the method call. These setups were tested on a Google Colab T4 node (CUDA 12.4) and the UW Hyak Klone supercomputing cluster (CUDA 11.8 and 12.4).

To Reproduce
Method 1: Using pip
pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
pip install alignn --no-deps

Method 2: Using pip + github
git clone https://github.com/usnistgov/alignn.git
cd alignn
git checkout develop
pip install -dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
pip install -e . --no-deps

Method 3: Using Conda
conda env create -f environment.yaml

environment.yaml v1 (CUDA 11.8):
name: alignn
channels:
- dglteam/label/cu118
- nvidia
- pytorch
dependencies:
- python=3.10
- pytorch-cuda=11.8
- pytorch=2.1.2
- torchdata=0.7
- torchvision
- torchaudio
- dgl=2.1.0
- alignn

environment.yaml v2 (CUDA 12.4):
name: alignn
channels:
- dglteam/label/th24_cu124
- nvidia
- pytorch
dependencies:
- python=3.10
- pytorch-cuda=12.4
- pytorch=2.4.0
- torchvision
- torchaudio
- torchdata=0.7
- dgl
- alignn

Expected behavior
The following code runs without error, where the file POSCAR is a vasp input file in the current directory:
from alignn.ff.ff import AlignnAtomwiseCalculator, default_path
from ase.io import read
calc = AlignnAtomwiseCalculator(path=default_path())
atoms = read("POSCAR", format='vasp')
atoms.calc = calc

An runtime error is thrown when running the following line:
energy = atoms.get_potential_energy()

Setups Tested:

  • OS: Linux (Ubuntu 20.04, Rock Linux 8)
  • CUDA Versions: 11.8, 12.4
  • Versions: 2024.12.12, 2025.4.1, current development release

Additional context

  • Attempted adding import torch; torch.set_default_device('cuda'), led to TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
  • Modifying the util.py module to convert all tensors to CPU (the .cpu() method) solved some errors but led to similar issues down the line.
  • All installation methods work with CPU-only

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions