CUDA Installation Dependency Issues

**Description**
A viable CUDA installation was not found that is compatible with the current code. The dependencies in the setup.py did not resolve by default and do not match the README or other example files. Fixing these dependencies led to multiple possible configurations that resolved dependencies but led to the following error:

> RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

This came up from multiple locations in the `util.py` module depending on the method call. These setups were tested on a Google Colab T4 node (CUDA 12.4) and the UW Hyak Klone supercomputing cluster (CUDA 11.8 and 12.4).

**To Reproduce**
Method 1: Using pip
`pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html`
`pip install alignn --no-deps`
 
Method 2: Using pip + github 
`git clone https://github.com/usnistgov/alignn.git`
`cd alignn`
`git checkout develop`
`pip install -dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html`
`pip install -e . --no-deps`

Method 3: Using Conda
`conda env create -f environment.yaml`

environment.yaml v1 (CUDA 11.8):
`name: alignn`
`channels:`
`  - dglteam/label/cu118`
`  - nvidia`
`  - pytorch`
`dependencies:`
`  - python=3.10`
`  - pytorch-cuda=11.8`
`  - pytorch=2.1.2`
`  - torchdata=0.7`
`  - torchvision`
`  - torchaudio`
`  - dgl=2.1.0`
`  - alignn`


environment.yaml v2 (CUDA 12.4):
`name: alignn`
`channels:`
`  - dglteam/label/th24_cu124`
`  - nvidia`
`  - pytorch`
`dependencies:`
`  - python=3.10`
`  - pytorch-cuda=12.4`
`  - pytorch=2.4.0`
`  - torchvision`
`  - torchaudio`
`  - torchdata=0.7`
`  - dgl`
`  - alignn`

**Expected behavior**
The following code runs without error, where the file `POSCAR` is a vasp input file in the current directory:
`from alignn.ff.ff import AlignnAtomwiseCalculator, default_path`
`from ase.io import read`
`calc = AlignnAtomwiseCalculator(path=default_path())`
`atoms = read("POSCAR", format='vasp')`
`atoms.calc = calc`

An runtime error is thrown when running the following line:
`energy = atoms.get_potential_energy()`


**Setups Tested:**
 - OS: Linux (Ubuntu 20.04, Rock Linux 8)
 - CUDA Versions: 11.8, 12.4
 - Versions: 2024.12.12, 2025.4.1, current development release

**Additional context**
- Attempted adding `import torch; torch.set_default_device('cuda')`, led to `TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.`
- Modifying the `util.py` module to convert all tensors to CPU (the `.cpu()` method) solved some errors but led to similar issues down the line.
- All installation methods work with CPU-only

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Installation Dependency Issues #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA Installation Dependency Issues #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions