Skip to content

HIP Kernel Compiler Issue #18

@nikhil-tensorwave

Description

@nikhil-tensorwave

When building OpenMM-HIP and running make test I am running into HIP compiler errors.
These errors are of the type

Error creating kernel <kernel function name>: hipErrorNotFound (500)

I'm also getting

Error launching HIP compiler: 256

Runtime environment:
ROCm 6.1.1
Ubuntu 22.04
Python 3.10
PyTorch 2.4.0

These were the setup steps used:

## build openmm
git clone https://github.com/openmm/openmm.git
git checkout 8.1.1
cd openmm
mkdir -p build/install
cd build
cmake ../ -D CMAKE_INSTALL_PREFIX=./install -D PYTHON_EXECUTABLE=/usr/bin/python3 -D OPENMM_BUILD_COMMON=ON -D OPENMM_PYTHON_USER_INSTALL=OFF -D CMAKE_CXX_FLAGS_RELEASE="-O3 -DNDEBUG -D_GLIBCXX_USE_CXX11_ABI=0"
make -j128
make test
make install
cd ../..

## build openmm-hip
git clone https://github.com/amd/openmm-hip.git
cd openmm-hip
git checkout mi300_changes  # necessary for ROCm 6.0!
mkdir build && cd build
cmake ../ -D OPENMM_DIR=../../openmm/build/install -D OPENMM_SOURCE_DIR=../../openmm -D CMAKE_INSTALL_PREFIX=../../openmm/build/install -D CMAKE_CXX_FLAGS_RELEASE="-O3 -DNDEBUG -D_GLIBCXX_USE_CXX11_ABI=0"
make -j128
make test  # these mostly fail with above errors
ctest -j 128 --rerun-failed  # if you keep rerunning them, more and more pass

When rerunning the make tests, a small percentage will pass.

Any help on this would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions