[Comm] allow collecting MPI timers from python#1728
[Comm] allow collecting MPI timers from python#1728tdavidcl wants to merge 1 commit intoShamrock-code:mainfrom
Conversation
|
Thanks @tdavidcl for opening this PR! You can do multiple things directly here: Once the workflow completes a message will appear displaying informations related to the run. Also the PR gets automatically reviewed by gemini, you can: |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces new MPI collective operations for strings (allgather_str, allgather_basic_str) and a distributed string histogram (all_string_histogram). It also adds functionality to retrieve all MPI timers and calculate their deltas, exposing these features to Python via pybind11. The review highlights several areas for improvement: a critical issue with hardcoded MPI_CHAR for std::basic_string<byte> which should be MPI_BYTE, opportunities to use modern C++ features like std::exclusive_scan and range-based for loops, performance optimizations for string concatenation by pre-reserving memory, and efficiency improvements for Python bindings by passing map parameters by const reference and pre-reserving vector capacity. Additionally, it suggests using find() or count() for map access to prevent unintended insertions.
| std::string accum_loc = ""; | ||
| for (auto &s : inputs) { | ||
| accum_loc += s + delimiter; | ||
| } |
There was a problem hiding this comment.
Repeatedly concatenating strings with += inside a loop can be inefficient due to multiple reallocations. It's more performant to first calculate the total required size, reserve the memory for the string, and then append the parts. This avoids intermediate allocations.
Example:
std::string accum_loc;
size_t total_size = 0;
for (const auto& s : inputs) {
total_size += s.size() + delimiter.size();
}
accum_loc.reserve(total_size);
for (const auto &s : inputs) {
accum_loc.append(s);
accum_loc.append(delimiter);
}| for (size_t i = 0; i < splitted.size(); i++) { | ||
| histogram[splitted[i]] += 1; | ||
| } |
|
|
||
| shamcomm_module.def( | ||
| "mpi_timers_delta", | ||
| [](std::unordered_map<std::string, f64> start, std::unordered_map<std::string, f64> end) { |
There was a problem hiding this comment.
The start and end maps are passed by value, which can cause unnecessary and potentially expensive copies. Passing them by const reference (const std::unordered_map<std::string, f64>&) would be more efficient. Note that if you make this change, you will need to use find() or at() instead of operator[] to access map elements, as operator[] is not a const operation.
| [](std::unordered_map<std::string, f64> start, std::unordered_map<std::string, f64> end) { | |
| [](const std::unordered_map<std::string, f64>& start, const std::unordered_map<std::string, f64>& end) { |
| std::vector<std::string> keys{}; | ||
|
|
||
| for (auto &[k, v] : end) { | ||
| keys.push_back(k); | ||
| } |
There was a problem hiding this comment.
To avoid potential reallocations while populating the keys vector, it's more efficient to reserve its size beforehand using end.size().
| std::vector<std::string> keys{}; | |
| for (auto &[k, v] : end) { | |
| keys.push_back(k); | |
| } | |
| std::vector<std::string> keys; | |
| keys.reserve(end.size()); | |
| for (auto const& [k, v] : end) { | |
| keys.push_back(k); | |
| } |
| std::unordered_map<std::string, f64> deltas{}; | ||
|
|
||
| for (auto &[k, c] : key_histo) { | ||
| deltas[k] = shamalgs::collective::allreduce_max(end[k] - start[k]); |
There was a problem hiding this comment.
Using operator[] on the maps can have the side effect of inserting a new element if the key doesn't exist. While this may be the intended behavior (to treat missing timers as 0), it's safer and clearer to use find() or count() to check for existence and retrieve the value. This also becomes necessary if you change the parameters to be const references for efficiency.
Example with count():
double end_val = end.count(k) ? end.at(k) : 0.0;
double start_val = start.count(k) ? start.at(k) : 0.0;
deltas[k] = shamalgs::collective::allreduce_max(end_val - start_val);| std::string result = ""; | ||
| for (u32 i = 0; i < ref_base.size(); i++) { | ||
| result += ref_base[i]; | ||
| } |
There was a problem hiding this comment.
This loop for string concatenation can be written more concisely using std::accumulate from the <numeric> header. While performance is not critical in this test, it's a good practice to use standard algorithms where applicable for clarity and conciseness.
std::string result = std::accumulate(ref_base.begin(), ref_base.end(), std::string{});References
- Refactor duplicated logic into a helper function or lambda to improve readability and maintainability. Using
std::accumulatereplaces a manual loop with a standard algorithm. - In tests, prefer programmatic construction of expected data collections over manual, verbose initialization to improve maintainability and robustness. Using standard algorithms for test data construction improves maintainability.
Workflow reportworkflow report corresponding to commit 34cd963 Pre-commit check reportPre-commit check: ✅ Test pipeline can run. Clang-tidy diff reportSuggested changesDetailed changes :diff --git a/src/shamcomm/include/shamcomm/collectives.hpp b/src/shamcomm/include/shamcomm/collectives.hpp
index b9d0a022..94d1f680 100644
--- a/src/shamcomm/include/shamcomm/collectives.hpp
+++ b/src/shamcomm/include/shamcomm/collectives.hpp
@@ -70,10 +70,10 @@ namespace shamcomm {
* values are the counts of their occurrences. (valid only on rank 0)
*/
std::unordered_map<std::string, int> string_histogram(
- const std::vector<std::string> &inputs, std::string delimiter = "\n");
+ const std::vector<std::string> &inputs, const std::string& delimiter = "\n");
/// same as string_histogram but with result return on every rank
std::unordered_map<std::string, int> all_string_histogram(
- const std::vector<std::string> &inputs, std::string delimiter = "\n");
+ const std::vector<std::string> &inputs, const std::string& delimiter = "\n");
} // namespace shamcomm
diff --git a/src/shamcomm/include/shamcomm/wrapper.hpp b/src/shamcomm/include/shamcomm/wrapper.hpp
index 3b7df7bc..f6275e2f 100644
--- a/src/shamcomm/include/shamcomm/wrapper.hpp
+++ b/src/shamcomm/include/shamcomm/wrapper.hpp
@@ -28,7 +28,7 @@ namespace shamcomm::mpi {
void register_time(std::string timername, f64 time);
/// get a timer value
- f64 get_timer(std::string timername);
+ f64 get_timer(const std::string& timername);
/// return all internal timers
const std::unordered_map<std::string, f64> &get_timers();
diff --git a/src/shamcomm/src/collectives.cpp b/src/shamcomm/src/collectives.cpp
index 158badcd..88597608 100644
--- a/src/shamcomm/src/collectives.cpp
+++ b/src/shamcomm/src/collectives.cpp
@@ -156,7 +156,7 @@ void shamcomm::allgather_basic_str(
}
std::unordered_map<std::string, int> shamcomm::string_histogram(
- const std::vector<std::string> &inputs, std::string delimiter) {
+ const std::vector<std::string> &inputs, const std::string& delimiter) {
std::string accum_loc = "";
for (auto &s : inputs) {
accum_loc += s + delimiter;
@@ -182,7 +182,7 @@ std::unordered_map<std::string, int> shamcomm::string_histogram(
}
std::unordered_map<std::string, int> shamcomm::all_string_histogram(
- const std::vector<std::string> &inputs, std::string delimiter) {
+ const std::vector<std::string> &inputs, const std::string& delimiter) {
std::string accum_loc = "";
for (auto &s : inputs) {
accum_loc += s + delimiter;
diff --git a/src/shamcomm/src/wrapper.cpp b/src/shamcomm/src/wrapper.cpp
index f56cfe64..b06fd033 100644
--- a/src/shamcomm/src/wrapper.cpp
+++ b/src/shamcomm/src/wrapper.cpp
@@ -41,7 +41,7 @@ namespace shamcomm::mpi {
}
}
- f64 get_timer(std::string timername) { return mpi_timers[timername]; }
+ f64 get_timer(const std::string& timername) { return mpi_timers[timername]; }
const std::unordered_map<std::string, f64> &get_timers() { return mpi_timers; }
diff --git a/src/tests/shamcomm/collectivesTests.cpp b/src/tests/shamcomm/collectivesTests.cpp
index 4be7c99d..7b18c189 100644
--- a/src/tests/shamcomm/collectivesTests.cpp
+++ b/src/tests/shamcomm/collectivesTests.cpp
@@ -52,7 +52,7 @@ TestStart(Unittest, "shamcomm/collectives::allgather_str", test_allgather_str, 4
result += ref_base[i];
}
- std::string send = ref_base[shamcomm::world_rank()];
+ const std::string& send = ref_base[shamcomm::world_rank()];
std::string recv = "random string"; // Just to check that it is overwritten
Detailed changes :- src/shamcomm/src/wrapper.cpp:62: warning: Member check_tag_value(i32 tag) (function) of namespace shamcomm::mpi is not documented.
+ src/shamcomm/src/wrapper.cpp:64: warning: Member check_tag_value(i32 tag) (function) of namespace shamcomm::mpi is not documented.
+ src/shampylib/src/pyShamcomm.cpp:28: warning: Member Register_pymod(shamcommlibinit) (function) of file pyShamcomm.cpp is not documented. |
No description provided.