Skip to content

fix: Prevent segfaults from thread ID array overflow#2172

Merged
dgaliffiAMD merged 8 commits into
developfrom
anujshuk/anujshuk-amd/support_for_thread_storage_expansion
Jan 7, 2026
Merged

fix: Prevent segfaults from thread ID array overflow#2172
dgaliffiAMD merged 8 commits into
developfrom
anujshuk/anujshuk-amd/support_for_thread_storage_expansion

Conversation

@anujshuk-amd

@anujshuk-amd anujshuk-amd commented Dec 4, 2025

Copy link
Copy Markdown
Contributor

Motivation

This pull request updates the way ROCm Systems Profiler manages and documents its thread limit, improving configurability and clarity for users and developers. The main changes include making the thread limit (ROCPROFSYS_MAX_THREADS) fully configurable at build time, updating documentation to reflect the new behavior, and ensuring that warnings and limits are consistently enforced throughout the code and tests.

Technical Details

Thread limit configuration and enforcement:

  • Added a check in CMakeLists.txt to ensure ROCPROFSYS_MAX_THREADS is at least 128, automatically setting it to 128 with a warning if a lower value is provided.
  • Replaced hardcoded thread limit (allowed_max_threads) in pthread_create_gotcha.cpp with the configurable ROCPROFSYS_MAX_THREADS value, ensuring all runtime checks and warnings use the actual configured limit. [1] [2] [3]

Documentation improvements:

  • Updated the development guide to explain the new thread limit behavior, including how exceeding the limit is handled gracefully, how to configure it, and the build-time validation rules.

Test updates:

  • Modified thread limit tests to use the configurable ROCPROFSYS_MAX_THREADS value instead of a hardcoded limit, and expanded the range of tested thread values.
  • Increased test timeouts to accommodate larger thread counts and ensure reliability with higher limits.

JIRA ID

SWDEV-548180

Test Plan

Tests updated. And passing

Test Result

Passed:
test_result.txt

Submission Checklist

This PR Depends on :ROCm/timemory#18

@github-actions github-actions Bot added documentation Improvements or additions to documentation project: rocprofiler-systems labels Dec 4, 2025
@anujshuk-amd anujshuk-amd marked this pull request as ready for review December 4, 2025 14:09
@anujshuk-amd anujshuk-amd requested review from a team and jrmadsen as code owners December 4, 2025 14:09
Copilot AI review requested due to automatic review settings December 4, 2025 14:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces robust error handling for sampling timer configuration and adds a new CMake option (ROCPROFSYS_MAX_STORAGE_THREADS) to prevent segmentation faults from thread ID array overflow. The changes improve system resilience and provide better memory management controls for high-thread-count workloads.

Key changes:

  • Added try-catch error handling in sampling timer configuration to gracefully handle failures and prevent crashes
  • Introduced ROCPROFSYS_MAX_STORAGE_THREADS configuration option with automatic validation and adjustment logic
  • Enhanced documentation explaining thread limit options, their defaults, and best practices

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
projects/rocprofiler-systems/source/lib/rocprof-sys/library/sampling.cpp Wraps realtime, cputime, and overflow sampling timer configurations in try-catch blocks with verbose logging and signal type cleanup on failure
projects/rocprofiler-systems/source/lib/common/defines.h.in Adds preprocessor definition for ROCPROFSYS_MAX_STORAGE_THREADS to make it available throughout the codebase
projects/rocprofiler-systems/docs/reference/development-guide.rst Documents the new thread configuration options with usage examples and warnings about misconfiguration
projects/rocprofiler-systems/cmake/Packages.cmake Propagates ROCPROFSYS_MAX_STORAGE_THREADS to timemory configuration and compile definitions
projects/rocprofiler-systems/CMakeLists.txt Implements auto-adjustment logic to prevent ROCPROFSYS_MAX_STORAGE_THREADS from being set below ROCPROFSYS_MAX_THREADS

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dgaliffiAMD dgaliffiAMD marked this pull request as draft December 4, 2025 15:54
@dgaliffiAMD

dgaliffiAMD commented Dec 4, 2025

Copy link
Copy Markdown
Contributor

Thank you, @anujshuk-amd.
I am going to convert this to a "draft", for now, since it seems to depend on a PR in Timemory that is still "in-flight".

@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch 2 times, most recently from 3f6044b to c8e2462 Compare December 8, 2025 15:30
@ROCm ROCm deleted a comment from Copilot AI Dec 8, 2025
@ROCm ROCm deleted a comment from Copilot AI Dec 8, 2025
@ROCm ROCm deleted a comment from Copilot AI Dec 8, 2025
@ROCm ROCm deleted a comment from Copilot AI Dec 8, 2025
@ROCm ROCm deleted a comment from Copilot AI Dec 8, 2025
@anujshuk-amd anujshuk-amd requested a review from Copilot December 8, 2025 15:50
@anujshuk-amd anujshuk-amd marked this pull request as ready for review December 8, 2025 15:51
@anujshuk-amd anujshuk-amd self-assigned this Dec 8, 2025

@sputhala-amd sputhala-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CI builds I think we set the value for ROCPROFSYS_MAX_THREADS as 64 here.
Do we want to modify this value in the preset file to 128 also since now we default to a min value of 128?

Also the logic here is a bit confusing:
This line was already existing but what is the logic of setting the value to 2048, should this also be 128. I actually think this code block will be never hit. And if it does hit this if condition, it should be Fatal error and not a fallback. I actually think this block can easily be deleted.

The code block here, currently goes to the else block only if:
We could not compute a valid power-of-two ceiling (_MAX_THREADS <= 0)
The user-provided value does not match what we expected.
Currently in such a case we still accept user specified value. Should we force the default 128 when we are not sure if the user specified value is an accepted value?

@anujshuk-amd

Copy link
Copy Markdown
Contributor Author

@sputhala-amd For CI builds I think we set the value for ROCPROFSYS_MAX_THREADS as 64 here.
Do we want to modify this value in the preset file to 128 also since now we default to a min value of 128?

Yes, CI should be updated to use 128 threads. Here's the rationale:

Why 128 is the baseline:

Calculated value for 8-core machines (16 × 8 = 128)
Power of 2 (required by internal data structures)
Minimum reasonable value for multi-threaded profiling
Now the documented minimum in our system
Action: CMakePresets.json has been updated to set 128 threads.

Also the logic here is a bit confusing:
This line was already existing but what is the logic of setting the value to 2048, should this also be 128. I actually think this code block will be never hit. And if it does hit this if condition, it should be Fatal error and not a fallback. I actually think this block can easily be deleted.

Analysis:

Current behavior:
The 2048 fallback only triggers if compute_pow2_ceil macro is broken
Under normal operation, this is effectively dead code
2048 was likely chosen as a "large system" default (≈128-core system)

Critical finding with compute_pow2_ceil:
Returns next power of 2 ≥ input value on success
Returns -1 on failure (missing Python3 or Python error)
Since -1 < 2, the fallback block WOULD trigger in error cases

Resolution: Replaced fallback with fatal error for better error handling.

Should we force 128?
Recommendation: No, we should not force 128.

Reasoning:
Respect user's explicit configuration choices
Forcing could incorrectly override valid higher values (e.g., user sets 256, we force 128)
Current warning system adequately alerts users to verify their settings

@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch from 94ae29f to 94d5c2c Compare December 24, 2025 18:38
Comment thread projects/rocprofiler-systems/CMakeLists.txt Outdated
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch 2 times, most recently from 94ae29f to 94d5c2c Compare January 6, 2026 19:01
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch from 91753be to 55e8e58 Compare January 6, 2026 19:53
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch from 55e8e58 to 658ff56 Compare January 6, 2026 19:56
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch from 658ff56 to 9e29fc1 Compare January 6, 2026 20:01
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch from 9e29fc1 to 0c76c83 Compare January 6, 2026 20:07
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch from 0c76c83 to 0850423 Compare January 6, 2026 20:16
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch 2 times, most recently from 4d95097 to 94d5c2c Compare January 6, 2026 20:25
Comment thread projects/rocprofiler-systems/CHANGELOG.md Outdated
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch 2 times, most recently from 74a8988 to 94d5c2c Compare January 7, 2026 16:03
@dgaliffiAMD dgaliffiAMD merged commit 596ffce into develop Jan 7, 2026
60 of 61 checks passed
@dgaliffiAMD dgaliffiAMD deleted the anujshuk/anujshuk-amd/support_for_thread_storage_expansion branch January 7, 2026 19:03
systems-assistant Bot pushed a commit to ROCm/rocprofiler-systems that referenced this pull request Jan 7, 2026
 (#2172)

**Thread limit configuration and enforcement: **

* Added a check in `CMakeLists.txt` to ensure `ROCPROFSYS_MAX_THREADS` is at least 128, automatically setting it to 128 with a warning if a lower value is provided.
* Replaced hardcoded thread limit (`allowed_max_threads`) in `pthread_create_gotcha.cpp` with the configurable `ROCPROFSYS_MAX_THREADS` value, ensuring all runtime checks and warnings use the actual configured limit.

**Documentation improvements: **

* Updated the development guide to explain the new thread limit behavior, including how exceeding the limit is handled gracefully, how to configure it, and the build-time validation rules.

**Test updates: **

* Modified thread limit tests to use the configurable `ROCPROFSYS_MAX_THREADS` value instead of a hardcoded limit and expanded the range of tested thread values.
* Increased test timeouts to accommodate larger thread counts and ensure reliability with higher limits.
[rocm-systems] ROCm/rocm-systems#2172 (commit 596ffce)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants