Loci communicator refactor by EdwardALuke · Pull Request #335 · EdwardALuke/loci

EdwardALuke · 2026-05-22T04:17:31Z

This is a major refactoring to allow for Loci schedules to be generated inside of a localized MPI communicator. The changes are broad ranging but straightforward. To use this you create a communicator under which you want to run Loci schedules, and pass set this as the default communicator for Loci with the call

Loci::SetDefaultComm(sub_comm) ;

When you create the fact database set the communicator it uses with code such as

fact_db facts ;
fact.set_comm(sub_comm) ;

Now when you generate schedules they will do communication in the sub_comm communicator. However note that you need to be careful if you use MPI_COMM_WORLD anywhere in your code, this could cause deadlocks. You can define the LOCI_STRICT_COMM to disable API calls that have a communicator argument that defaults to MPI_COMM_WORLD to check if you need to change any parts of your program. Set this by changing the MISC line in sys.conf to include -DLOCI_STRICT_COMM

… for LOCI_STRICT_COMM define to force users to pass communicator into Loci APIs

…ty routines

EdwardALuke · 2026-05-22T13:20:06Z

@cfdrcmpgale This is the changes to Loci to allow scheduling under a MPI communicator other than MPI_COMM_WORLD. You have to be careful with the solver as you might be using MPI_COMM_WORLD without realizing it. To keep the API compatible, some calls have a final argument that is the MPI communicator but will default to MPI_COMM_WORLD. You can use the LOCI_STRICT_COMM to change the API to require explicit communicators which you can use figure out what changes you need to make to your code. You can look at the changes to the quickTest/FVM code to see how that works. I have only tested that this works with the FVM code, I am not sure if there aren't problems with advanced features like FVMOverset or FVMAdapt. That hasn't been tested using the sub communicator.

rlfontenot · 2026-05-22T13:26:28Z

@EdwardALuke I presume you also went and removed all of the MPI_COMM_WORLD's from chem? I see it in the latest version we have from you all in the linear solvers/etc.

EdwardALuke · 2026-05-22T13:34:48Z

@rlfontenot I have not done that for CHEM or flowPsi yet. Some care may need to be taken to make sure we maintain compatibility with the 4.1 stable release of Loci. This is very much a work in progress. But since the changes touch so much of the code base, we probably will want to have several pull requests to prevent developing an orphaned branch that is too difficult to merge back into the dev branch. Right now I am wanting feedback on if this is going to meet the requirements.

cfdrcmpgale · 2026-05-22T14:33:25Z

@EdwardALuke I reviewed the changes and plan for adoption of this feature. We should not have any issue with the changes since the default API behavior is to use MPI_COMM_WORLD, and the LOCI_STRICT_COMM points to all the locations where the change needs to occur in solvers. This meets the initial requirements. Now, how do you envision the fact_db to work for loaded applications with separate communicators?

EdwardALuke · 2026-05-22T15:21:10Z

@cfdrcmpgale Ah, so I suspected that this might not be what you want. If you want Loci to coordinate between different meshes loaded on different processors, then communicators is not what you are after. What this would allow is for Loci to make independent schedules on independent communicators. But there would be no way for Loci to directly coordinate communication between them. If you wanted the different codes to talk to each other through maps or similar constructs, then all of the data would need to live in the same communicator. In that case you just want to have a facility that could read in a mesh data structure into a subset of the processors. Note, that if the applications don't follow the same iteration behavior, then you might be idling processors while Loci schedules one phase of computations because of an implicit synchronization that happens at the level of iterations. Where the current features set would be useful is if we wanted to create a subset of processors to run an independent application, say a structures simulation, and then another subset to run say chem. Then the two codes could run independently until some external-to-loci solver coordination infrastructure would allow inter-communicator communication.

cfdrcmpgale · 2026-05-22T15:41:48Z

@EdwardALuke Yes, the use case that we discussed was more around Loci-solvers driving Loci-applications on a subset of CPUs. But I do see the fundamental issue, in order for communication to occur there would need to be a way for communicating contact surface information across groups. Face contact maps would be significantly more difficult if at all possible. I see now that the target of this current feature is for synchronization being handled externally, non-Loci-based solvers driving simulations with Loci-based solvers (as with Loci/RTE). Thanks!

EdwardALuke · 2026-05-22T18:15:53Z

@cfdrcmpgale Would just having a feature where the vog file reader could be directed to read the database onto only a subset of processors be sufficient to meet your needs, or do you want to decouple models in a more significant way? It should be relatively simple to add that feature and would probably be the most effective way to deal with running a coupled problem where one was very small, but you needed to run on a large number of processors for the other larger part. But this wouldn't solve other coordination problems.

cfdrcmpgale · 2026-05-26T20:03:03Z

@EdwardALuke You are suggesting keeping the same communicator for the loaded application but having the vog information distributed only on a subset of processors? In this case, would it be equivalent to having empty domain partitions on the "extra" cores? The face contact maps would likely work with that approach.

EdwardALuke added 7 commits May 18, 2026 17:34

removed unused elements of the rule API.

9932661

Phase 0 updates

b92dea4

refactoring code to use fact_db communicator instead of MPI_COMM_WORLD

4f4e8df

Update to get communicator size and rank from fact_db

1f38dcd

Extended to replace use of MPI_COMM_WORLD with new API calls, support…

143441e

… for LOCI_STRICT_COMM define to force users to pass communicator into Loci APIs

More complete threading of MPI communicator through distributed utili…

da32dea

…ty routines

Fixing a few deadlock issues when running on a sub communicator.

6df53d5

EdwardALuke self-assigned this May 22, 2026

EdwardALuke added the refactor Used for reorganization of code to make it simpler or more consistent label May 22, 2026

EdwardALuke requested a review from cfdrcmpgale May 22, 2026 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loci communicator refactor#335

Loci communicator refactor#335
EdwardALuke wants to merge 7 commits into
devfrom
loci-communicator-refactor

EdwardALuke commented May 22, 2026

Uh oh!

EdwardALuke commented May 22, 2026

Uh oh!

rlfontenot commented May 22, 2026

Uh oh!

EdwardALuke commented May 22, 2026

Uh oh!

cfdrcmpgale commented May 22, 2026

Uh oh!

EdwardALuke commented May 22, 2026

Uh oh!

cfdrcmpgale commented May 22, 2026

Uh oh!

EdwardALuke commented May 22, 2026

Uh oh!

cfdrcmpgale commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EdwardALuke commented May 22, 2026

Uh oh!

EdwardALuke commented May 22, 2026

Uh oh!

rlfontenot commented May 22, 2026

Uh oh!

EdwardALuke commented May 22, 2026

Uh oh!

cfdrcmpgale commented May 22, 2026

Uh oh!

EdwardALuke commented May 22, 2026

Uh oh!

cfdrcmpgale commented May 22, 2026

Uh oh!

EdwardALuke commented May 22, 2026

Uh oh!

cfdrcmpgale commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants