Skip to content

Fix for GPU aware MPI#200

Closed
LudwigBoess wants to merge 1 commit into1.4.0rcfrom
dev/gpu_mpi_fix
Closed

Fix for GPU aware MPI#200
LudwigBoess wants to merge 1 commit into1.4.0rcfrom
dev/gpu_mpi_fix

Conversation

@LudwigBoess
Copy link
Copy Markdown
Collaborator

Replaces explicit host copy with Kokkos::fence() to fix GPU-aware MPI issues.
This error appeared on Aurora, since they don't enforce that GPU operations are completed before the MPI call is invoked. For other MPI implementations, this seems to be the case. In these cases, Kokkos::fence(); should be a null operation at no overhead.

ToDo: Check thoroughly on different systems/architectures.

@LudwigBoess
Copy link
Copy Markdown
Collaborator Author

For reference, this was the old issue:

comparison.mov

@LudwigBoess
Copy link
Copy Markdown
Collaborator Author

Now it works, at least on Aurora.
turb_00000006

The impact on communication time is within the scatter of the nodes, I would say. This is with 8 nodes.
communication_time

@LudwigBoess LudwigBoess requested a review from haykh May 4, 2026 14:26
@LudwigBoess
Copy link
Copy Markdown
Collaborator Author

Replaced by #202

@LudwigBoess LudwigBoess closed this May 8, 2026
@LudwigBoess LudwigBoess deleted the dev/gpu_mpi_fix branch May 8, 2026 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant