Skip to content

[Offload][OMPT] Adapt device tracing for omp_initial_device on host#2458

Open
Thyre wants to merge 1 commit into
ROCm:amd-stagingfrom
Thyre:ompt-initial-device-num-tracing
Open

[Offload][OMPT] Adapt device tracing for omp_initial_device on host#2458
Thyre wants to merge 1 commit into
ROCm:amd-stagingfrom
Thyre:ompt-initial-device-num-tracing

Conversation

@Thyre
Copy link
Copy Markdown

@Thyre Thyre commented May 8, 2026

In llvm-project#192924, all host side callbacks were changed to pass omp_initial_device instead of omp_get_initial_device() to an attached tool. This change allowed to set initial_device_num in ompt_start_tool to a fixed value, hence enabling tools to rely on a passed identifier to differentiate the host device from any other device.

However, this change needs to be adapted in the device tracing records as well, as tools might rely on both host-side callbacks and device tracing records having consistent identifiers. A tool might e.g. discard invalid records due to not recognizing the passed omp_get_initial_device(). This is amplified by ompt_get_num_devices() not yielding the actual number of devices.

Thus, update the device tracing interface to also consistently use omp_initial_device for the host device.

In llvm-project#192924, all host side callbacks were changed to pass
`omp_initial_device` instead of `omp_get_initial_device()` to an attached
tool. This change allowed to set `initial_device_num` in `ompt_start_tool`
to a fixed value, hence enabling tools to rely on a passed identifier to
differentiate the host device from any other device.

However, this change needs to be adapted in the device tracing records as well,
as tools might rely on both host-side callbacks and device tracing records
having consistent identifiers. A tool might e.g. discard invalid records due
to not recognizing the passed `omp_get_initial_device()`. This is amplified by
`ompt_get_num_devices()` not yielding the actual number of devices.

Thus, update the device tracing interface to also consistently use
`omp_initial_device` for the host device.

Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
@z1-cciauto
Copy link
Copy Markdown
Collaborator

@Thyre
Copy link
Copy Markdown
Author

Thyre commented May 8, 2026

CC @jplehr, @mhalk, @ronlieb

I unfortunately wasn't able to test if there are any test failures caused by this change.
I don't really have a software setup for this fork available. I managed to build the fork after some fiddling around, but didn't manage to run anything OpenMP target related really 😕

@ronlieb ronlieb requested review from dhruvachak, jplehr and mhalk May 8, 2026 11:22
@jplehr
Copy link
Copy Markdown

jplehr commented May 8, 2026

Changes seem reasonable to me. I kicked off testing.

@ronlieb
Copy link
Copy Markdown
Collaborator

ronlieb commented May 8, 2026

!PSDB
restarting npsdb due to hipblender channel error, not related to this PR

@z1-cciauto
Copy link
Copy Markdown
Collaborator

@Thyre
Copy link
Copy Markdown
Author

Thyre commented May 12, 2026

Looks like the failed checks are:

Linux::release / Test gfx94X-dcgpu / Test hipfft / Test hipfft (shard 1/1) (gfx94X-dcgpu)

:0:/__w/llvm-project/llvm-project/rocm-systems/projects/clr/hipamd/src/hip_global.cpp:64  : 154838330319 us:  Cannot find Symbol with name: _ZL39generate_random_interleaved_data_kernelI12input_val_1DImEfEvT_S2_mmS2_P14rocfft_complexIT0_ES2_mS2_m.intern.7b72a89fd20db601

Linux::release / Test gfx94X-dcgpu / Test hipsolver / Test hipsolver (shard 1/1) (gfx94X-dcgpu)

Error: The action 'Test' has timed out after 5 minutes.

Linux::release / Test gfx94X-dcgpu / Test rocfft / Test rocfft (shard 1/1) (gfx94X-dcgpu)

:0:/__w/llvm-project/llvm-project/rocm-systems/projects/clr/hipamd/src/hip_global.cpp:64  : 7884505403011 us:  Cannot find Symbol with name: _ZL39generate_random_interleaved_data_kernelI12input_val_1DImEfEvT_S2_mmS2_P14rocfft_complexIT0_ES2_mS2_m.intern.9d09c37ba467f135

I'd be surprised if they are related to the changes in this PR, since the symbols seem to come from hipfft / rocfft...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants