Skip to content

[Comm] allow collecting MPI timers from python#1728

Open
tdavidcl wants to merge 1 commit intoShamrock-code:mainfrom
tdavidcl:patch-2026-03-25-22-00
Open

[Comm] allow collecting MPI timers from python#1728
tdavidcl wants to merge 1 commit intoShamrock-code:mainfrom
tdavidcl:patch-2026-03-25-22-00

Conversation

@tdavidcl
Copy link
Copy Markdown
Member

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks @tdavidcl for opening this PR!

You can do multiple things directly here:
1 - Comment pre-commit.ci run to run pre-commit checks.
2 - Comment pre-commit.ci autofix to apply fixes.
3 - Add label autofix.ci to fix authorship & pre-commit for every commit made.
4 - Add label light-ci to only trigger a reduced & faster version of the CI (need the full one before merge).
5 - Add label trigger-ci to create an empty commit to trigger the CI.

Once the workflow completes a message will appear displaying informations related to the run.

Also the PR gets automatically reviewed by gemini, you can:
1 - Comment /gemini review to trigger a review
2 - Comment /gemini summary for a summary
3 - Tag it using @gemini-code-assist either in the PR or in review comments on files

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the shamcomm library by introducing new collective communication primitives for strings and exposing MPI timer collection functionality to Python. These changes facilitate detailed performance profiling of parallel applications directly from Python, enabling users to better understand and optimize their code's execution across multiple processes.

Highlights

  • MPI Timer Collection: Enabled the collection of MPI timers from Python, allowing for performance analysis of parallel applications.
  • New Collective Communication Functions: Introduced allgather_str and allgather_basic_str for efficient string aggregation across all MPI ranks, and all_string_histogram for distributed histogram computation.
  • Python Bindings: Provided Python bindings for retrieving individual MPI timers, all active timers, and calculating timer deltas across ranks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new MPI collective operations for strings (allgather_str, allgather_basic_str) and a distributed string histogram (all_string_histogram). It also adds functionality to retrieve all MPI timers and calculate their deltas, exposing these features to Python via pybind11. The review highlights several areas for improvement: a critical issue with hardcoded MPI_CHAR for std::basic_string<byte> which should be MPI_BYTE, opportunities to use modern C++ features like std::exclusive_scan and range-based for loops, performance optimizations for string concatenation by pre-reserving memory, and efficiency improvements for Python bindings by passing map parameters by const reference and pre-reserving vector capacity. Additionally, it suggests using find() or count() for map access to prevent unintended insertions.

Comment on lines +186 to +189
std::string accum_loc = "";
for (auto &s : inputs) {
accum_loc += s + delimiter;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Repeatedly concatenating strings with += inside a loop can be inefficient due to multiple reallocations. It's more performant to first calculate the total required size, reserve the memory for the string, and then append the parts. This avoids intermediate allocations.

Example:

std::string accum_loc;
size_t total_size = 0;
for (const auto& s : inputs) {
    total_size += s.size() + delimiter.size();
}
accum_loc.reserve(total_size);
for (const auto &s : inputs) {
    accum_loc.append(s);
    accum_loc.append(delimiter);
}

Comment on lines +198 to +200
for (size_t i = 0; i < splitted.size(); i++) {
histogram[splitted[i]] += 1;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and to follow modern C++ practices, you can use a range-based for loop here.

Suggested change
for (size_t i = 0; i < splitted.size(); i++) {
histogram[splitted[i]] += 1;
}
for (const auto& s : splitted) {
histogram[s]++;
}


shamcomm_module.def(
"mpi_timers_delta",
[](std::unordered_map<std::string, f64> start, std::unordered_map<std::string, f64> end) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The start and end maps are passed by value, which can cause unnecessary and potentially expensive copies. Passing them by const reference (const std::unordered_map<std::string, f64>&) would be more efficient. Note that if you make this change, you will need to use find() or at() instead of operator[] to access map elements, as operator[] is not a const operation.

Suggested change
[](std::unordered_map<std::string, f64> start, std::unordered_map<std::string, f64> end) {
[](const std::unordered_map<std::string, f64>& start, const std::unordered_map<std::string, f64>& end) {

Comment on lines +43 to +47
std::vector<std::string> keys{};

for (auto &[k, v] : end) {
keys.push_back(k);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To avoid potential reallocations while populating the keys vector, it's more efficient to reserve its size beforehand using end.size().

Suggested change
std::vector<std::string> keys{};
for (auto &[k, v] : end) {
keys.push_back(k);
}
std::vector<std::string> keys;
keys.reserve(end.size());
for (auto const& [k, v] : end) {
keys.push_back(k);
}

std::unordered_map<std::string, f64> deltas{};

for (auto &[k, c] : key_histo) {
deltas[k] = shamalgs::collective::allreduce_max(end[k] - start[k]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using operator[] on the maps can have the side effect of inserting a new element if the key doesn't exist. While this may be the intended behavior (to treat missing timers as 0), it's safer and clearer to use find() or count() to check for existence and retrieve the value. This also becomes necessary if you change the parameters to be const references for efficiency.

Example with count():

double end_val = end.count(k) ? end.at(k) : 0.0;
double start_val = start.count(k) ? start.at(k) : 0.0;
deltas[k] = shamalgs::collective::allreduce_max(end_val - start_val);

Comment on lines +50 to +53
std::string result = "";
for (u32 i = 0; i < ref_base.size(); i++) {
result += ref_base[i];
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop for string concatenation can be written more concisely using std::accumulate from the <numeric> header. While performance is not critical in this test, it's a good practice to use standard algorithms where applicable for clarity and conciseness.

    std::string result = std::accumulate(ref_base.begin(), ref_base.end(), std::string{});
References
  1. Refactor duplicated logic into a helper function or lambda to improve readability and maintainability. Using std::accumulate replaces a manual loop with a standard algorithm.
  2. In tests, prefer programmatic construction of expected data collections over manual, verbose initialization to improve maintainability and robustness. Using standard algorithms for test data construction improves maintainability.

@github-actions
Copy link
Copy Markdown
Contributor

Workflow report

workflow report corresponding to commit 34cd963
Commiter email is timothee.davidcleris@proton.me
GitHub page artifact URL GitHub page artifact link (can expire)

Pre-commit check report

Pre-commit check: ✅

trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check for merge conflicts................................................Passed
check that executables have shebangs.....................................Passed
check that scripts with shebangs are executable..........................Passed
check for added large files..............................................Passed
check for case conflicts.................................................Passed
check for broken symlinks................................................Passed
check yaml...............................................................Passed
detect private key.......................................................Passed
No-tabs checker..........................................................Passed
Tabs remover.............................................................Passed
Validate GitHub Workflows................................................Passed
clang-format.............................................................Passed
ruff check...............................................................Passed
ruff format..............................................................Passed
Check doxygen headers....................................................Passed
Check license headers....................................................Passed
Check #pragma once.......................................................Passed
Check SYCL #include......................................................Passed
No ssh in git submodules remote..........................................Passed
No UTF-8 in files (except for authors)...................................Passed

Test pipeline can run.

Clang-tidy diff report


/__w/Shamrock/Shamrock/src/shamcomm/src/collectives.cpp:159:57: warning: the parameter 'delimiter' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]
  159 |     const std::vector<std::string> &inputs, std::string delimiter) {
      |                                                         ^
      |                                             const      &
/__w/Shamrock/Shamrock/src/shamcomm/src/collectives.cpp:159:45: note: FIX-IT applied suggested code changes
  159 |     const std::vector<std::string> &inputs, std::string delimiter) {
      |                                             ^
/__w/Shamrock/Shamrock/src/shamcomm/src/collectives.cpp:159:56: note: FIX-IT applied suggested code changes
  159 |     const std::vector<std::string> &inputs, std::string delimiter) {
      |                                                        ^
/__w/Shamrock/Shamrock/src/shamcomm/include/shamcomm/collectives.hpp:73:49: note: FIX-IT applied suggested code changes
   73 |         const std::vector<std::string> &inputs, std::string delimiter = "\n");
      |                                                 ^
/__w/Shamrock/Shamrock/src/shamcomm/include/shamcomm/collectives.hpp:73:60: note: FIX-IT applied suggested code changes
   73 |         const std::vector<std::string> &inputs, std::string delimiter = "\n");
      |                                                            ^
/__w/Shamrock/Shamrock/src/shamcomm/src/collectives.cpp:185:57: warning: the parameter 'delimiter' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]
  185 |     const std::vector<std::string> &inputs, std::string delimiter) {
      |                                                         ^
      |                                             const      &
/__w/Shamrock/Shamrock/src/shamcomm/src/collectives.cpp:185:45: note: FIX-IT applied suggested code changes
  185 |     const std::vector<std::string> &inputs, std::string delimiter) {
      |                                             ^
/__w/Shamrock/Shamrock/src/shamcomm/src/collectives.cpp:185:56: note: FIX-IT applied suggested code changes
  185 |     const std::vector<std::string> &inputs, std::string delimiter) {
      |                                                        ^
/__w/Shamrock/Shamrock/src/shamcomm/include/shamcomm/collectives.hpp:77:49: note: FIX-IT applied suggested code changes
   77 |         const std::vector<std::string> &inputs, std::string delimiter = "\n");
      |                                                 ^
/__w/Shamrock/Shamrock/src/shamcomm/include/shamcomm/collectives.hpp:77:60: note: FIX-IT applied suggested code changes
   77 |         const std::vector<std::string> &inputs, std::string delimiter = "\n");
      |                                                            ^

56 warnings generated.
clang-tidy applied 8 of 8 suggested fixes.
Suppressed 54 warnings (54 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.

/__w/Shamrock/Shamrock/src/shamcomm/src/wrapper.cpp:44:31: warning: the parameter 'timername' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]
   44 |     f64 get_timer(std::string timername) { return mpi_timers[timername]; }
      |                               ^
      |                   const      &
/__w/Shamrock/Shamrock/src/shamcomm/include/shamcomm/wrapper.hpp:31:19: note: FIX-IT applied suggested code changes
   31 |     f64 get_timer(std::string timername);
      |                   ^
/__w/Shamrock/Shamrock/src/shamcomm/include/shamcomm/wrapper.hpp:31:30: note: FIX-IT applied suggested code changes
   31 |     f64 get_timer(std::string timername);
      |                              ^
/__w/Shamrock/Shamrock/src/shamcomm/src/wrapper.cpp:44:19: note: FIX-IT applied suggested code changes
   44 |     f64 get_timer(std::string timername) { return mpi_timers[timername]; }
      |                   ^
/__w/Shamrock/Shamrock/src/shamcomm/src/wrapper.cpp:44:30: note: FIX-IT applied suggested code changes
   44 |     f64 get_timer(std::string timername) { return mpi_timers[timername]; }
      |                              ^

83 warnings generated.
clang-tidy applied 4 of 4 suggested fixes.
Suppressed 82 warnings (55 in non-user code, 27 due to line filter).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.

/__w/Shamrock/Shamrock/src/tests/shamcomm/collectivesTests.cpp:55:17: warning: the variable 'send' is copy-constructed from a const reference but is only used as const reference; consider making it a const reference [performance-unnecessary-copy-initialization]
   55 |     std::string send = ref_base[shamcomm::world_rank()];
      |                 ^
      |     const      &
/__w/Shamrock/Shamrock/src/tests/shamcomm/collectivesTests.cpp:55:5: note: FIX-IT applied suggested code changes
   55 |     std::string send = ref_base[shamcomm::world_rank()];
      |     ^
/__w/Shamrock/Shamrock/src/tests/shamcomm/collectivesTests.cpp:55:16: note: FIX-IT applied suggested code changes
   55 |     std::string send = ref_base[shamcomm::world_rank()];
      |                ^

192 warnings generated.
clang-tidy applied 2 of 2 suggested fixes.
Suppressed 191 warnings (190 in non-user code, 1 due to line filter).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.

Suggested changes

Detailed changes :
diff --git a/src/shamcomm/include/shamcomm/collectives.hpp b/src/shamcomm/include/shamcomm/collectives.hpp
index b9d0a022..94d1f680 100644
--- a/src/shamcomm/include/shamcomm/collectives.hpp
+++ b/src/shamcomm/include/shamcomm/collectives.hpp
@@ -70,10 +70,10 @@ namespace shamcomm {
      *         values are the counts of their occurrences. (valid only on rank 0)
      */
     std::unordered_map<std::string, int> string_histogram(
-        const std::vector<std::string> &inputs, std::string delimiter = "\n");
+        const std::vector<std::string> &inputs, const std::string& delimiter = "\n");
 
     /// same as string_histogram but with result return on every rank
     std::unordered_map<std::string, int> all_string_histogram(
-        const std::vector<std::string> &inputs, std::string delimiter = "\n");
+        const std::vector<std::string> &inputs, const std::string& delimiter = "\n");
 
 } // namespace shamcomm
diff --git a/src/shamcomm/include/shamcomm/wrapper.hpp b/src/shamcomm/include/shamcomm/wrapper.hpp
index 3b7df7bc..f6275e2f 100644
--- a/src/shamcomm/include/shamcomm/wrapper.hpp
+++ b/src/shamcomm/include/shamcomm/wrapper.hpp
@@ -28,7 +28,7 @@ namespace shamcomm::mpi {
     void register_time(std::string timername, f64 time);
 
     /// get a timer value
-    f64 get_timer(std::string timername);
+    f64 get_timer(const std::string& timername);
 
     /// return all internal timers
     const std::unordered_map<std::string, f64> &get_timers();
diff --git a/src/shamcomm/src/collectives.cpp b/src/shamcomm/src/collectives.cpp
index 158badcd..88597608 100644
--- a/src/shamcomm/src/collectives.cpp
+++ b/src/shamcomm/src/collectives.cpp
@@ -156,7 +156,7 @@ void shamcomm::allgather_basic_str(
 }
 
 std::unordered_map<std::string, int> shamcomm::string_histogram(
-    const std::vector<std::string> &inputs, std::string delimiter) {
+    const std::vector<std::string> &inputs, const std::string& delimiter) {
     std::string accum_loc = "";
     for (auto &s : inputs) {
         accum_loc += s + delimiter;
@@ -182,7 +182,7 @@ std::unordered_map<std::string, int> shamcomm::string_histogram(
 }
 
 std::unordered_map<std::string, int> shamcomm::all_string_histogram(
-    const std::vector<std::string> &inputs, std::string delimiter) {
+    const std::vector<std::string> &inputs, const std::string& delimiter) {
     std::string accum_loc = "";
     for (auto &s : inputs) {
         accum_loc += s + delimiter;
diff --git a/src/shamcomm/src/wrapper.cpp b/src/shamcomm/src/wrapper.cpp
index f56cfe64..b06fd033 100644
--- a/src/shamcomm/src/wrapper.cpp
+++ b/src/shamcomm/src/wrapper.cpp
@@ -41,7 +41,7 @@ namespace shamcomm::mpi {
         }
     }
 
-    f64 get_timer(std::string timername) { return mpi_timers[timername]; }
+    f64 get_timer(const std::string& timername) { return mpi_timers[timername]; }
 
     const std::unordered_map<std::string, f64> &get_timers() { return mpi_timers; }
 
diff --git a/src/tests/shamcomm/collectivesTests.cpp b/src/tests/shamcomm/collectivesTests.cpp
index 4be7c99d..7b18c189 100644
--- a/src/tests/shamcomm/collectivesTests.cpp
+++ b/src/tests/shamcomm/collectivesTests.cpp
@@ -52,7 +52,7 @@ TestStart(Unittest, "shamcomm/collectives::allgather_str", test_allgather_str, 4
         result += ref_base[i];
     }
 
-    std::string send = ref_base[shamcomm::world_rank()];
+    const std::string& send = ref_base[shamcomm::world_rank()];
 
     std::string recv = "random string"; // Just to check that it is overwritten
 
# Doxygen diff with `main` Removed warnings : 1 New warnings : 2 Warnings count : 8366 → 8367 (0.0%)
Detailed changes :
- src/shamcomm/src/wrapper.cpp:62: warning: Member check_tag_value(i32 tag) (function) of namespace shamcomm::mpi is not documented.
+ src/shamcomm/src/wrapper.cpp:64: warning: Member check_tag_value(i32 tag) (function) of namespace shamcomm::mpi is not documented.
+ src/shampylib/src/pyShamcomm.cpp:28: warning: Member Register_pymod(shamcommlibinit) (function) of file pyShamcomm.cpp is not documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant