Optimization: Parallelize GXS message deserialization using OpenMP (4x speedup)#246
Optimization: Parallelize GXS message deserialization using OpenMP (4x speedup)#246jolavillette wants to merge 9 commits intoRetroShare:masterfrom
Conversation
|
openmp is a great tool for parallelizing (I use if often), but you need to make sure that
|
|
Note that in this PR only rsgenexchange is changed. So here is what Antigravity says: "I have reviewed the OpenMP parallelized code in libretroshare/src/gxs/rsgenexchange.cc (around line 1559) considering Cyril's feedback. Safety Analysis: No static variables involved below the parallel loop calls mSerialiser->deserialise(...). 100% Re-entrant: SQLCipher (Recursive/Simultaneous usage): Conclusion: The code appears to comply with the stated constraints and should not cause crashes or corruption related to SQLCipher or static variables. The parallelization here is strictly computational (deserialization). However, please ensure that all services using RsGenExchange (Forums, Channels, etc.) strictly use "clean" serializers (like the Forums one I verified), which follows the standard pattern in the current codebase." |
1f06ab1 to
dc46a88
Compare
|
remove debug messages |
1c04192 to
bc8693f
Compare
40ec79e to
5478934
Compare
5478934 to
93d9db0
Compare
…st, document thread-safety contract
…e_libretroshare.pri
ea2bfdb to
233016d
Compare
Code by Antigravity
This PR requires RetroShare pr/3136
Description
This PR significantly optimizes the loading performance of GXS services (Channels, Forums, Boards) by parallelizing the message deserialization process.
Profiling identified
RsGenExchange::getMsgDataas a major bottleneck during the loading phase, where deserialization was performed sequentially on a single thread. This PR introduces OpenMP to parallelize this workload across available CPU cores.Changes
Enabled
-fopenmpcompiler and linker flags globally for Linux and Windows (MSYS2) builds. This ensures consistent OpenMP support acrosslibretroshareand the GUI executable.libretroshare): Refactored the main loop inRsGenExchange::getMsgDatato use#pragma omp parallel for. This allows concurrent deserialization ofRsGxsMsgItemobjects.Performance Results
Benchmarks performed on an Intel Xeon E3-1230 v6 (4 cores / 8 threads) on Ubuntu 24.04, and on an Intel 4790K (4 cores / 8 threads) on Windows 10
Example: Deserialization time on Xeon:
| Dataset Size | Serial (Before) | Parallel (After) | Speedup |
| Large (~6000 items) | ~1900 ms | ~440 ms | 4.3x
| Medium (~3000 items) | ~265 ms | ~65 ms | 4.0x
| Small (~500 items) | ~388 ms | ~80 ms | 4.8x
Impact
Notes
std::sortoperations from the UI thread to a worker thread was tested but yielded negligible gains (~50ms) compared to Qt's rendering cost. Therefore, UI-specific changes were reverted to maintain code simplicity.