Skip to content

Fix: Replace hard-coded thread limit with TIMEMORY_MAX_THREADS#18

Merged
dgaliffiAMD merged 3 commits into
ROCm:rocprofiler-systemsfrom
anujshuk-amd:anujshuk-amd/fix-storage-thread-limit-rocprofiler-systems
Dec 8, 2025
Merged

Fix: Replace hard-coded thread limit with TIMEMORY_MAX_THREADS#18
dgaliffiAMD merged 3 commits into
ROCm:rocprofiler-systemsfrom
anujshuk-amd:anujshuk-amd/fix-storage-thread-limit-rocprofiler-systems

Conversation

@anujshuk-amd

@anujshuk-amd anujshuk-amd commented Dec 4, 2025

Copy link
Copy Markdown

Motivation

This pull request updates the thread limit logic in the operation::set_storage implementation to improve stability and prevent segfaults when running with a large number of threads. It also bumps the project version to 4.0.1rc0 and documents the change in the changelog.

Technical Details

Bug fix for thread limit:

  • source/timemory/operations/types.hpp: Changed the hard-coded thread limit from 4096 to use the configurable TIMEMORY_MAX_THREADS, preventing segfaults when Linux thread IDs exceed 4096.

Version update and documentation:

  • VERSION: Updated the version number to 4.0.1rc0.

  • CHANGELOG.md: Added an entry for version 4.0.1rc0 describing the thread limit bug fix and its impact.
    Build System Improvements:

  • Added a new CMake option, TIMEMORY_MAX_STORAGE_THREADS, defaulting to the value of TIMEMORY_MAX_THREADS. This option controls storage array sizing for per-thread data and auto-adjusts if set lower than TIMEMORY_MAX_THREADS.

  • Integrated the new option into the build system and ensured appropriate warnings and adjustments are made if misconfigured.

Test Plan

Test Result

Submission Checklist

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a critical bug where hard-coded thread limits could cause segfaults when Linux thread IDs exceed 4096, replacing the fixed limit with a new configurable TIMEMORY_MAX_STORAGE_THREADS macro. The changes introduce a CMake configuration option that defaults to TIMEMORY_MAX_THREADS and includes safeguards to prevent misconfiguration.

Key Changes:

  • Replaced hard-coded max_threads = 4096 with TIMEMORY_MAX_STORAGE_THREADS macro in operation::set_storage
  • Added TIMEMORY_MAX_STORAGE_THREADS CMake option with validation logic
  • Updated version to 4.0.1rc0 and documented changes in CHANGELOG

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
source/timemory/operations/types.hpp Replaced hard-coded thread limit with configurable TIMEMORY_MAX_STORAGE_THREADS macro
cmake/Modules/Options.cmake Added new TIMEMORY_MAX_STORAGE_THREADS CMake option with validation and auto-adjustment logic
VERSION Updated version from 4.0.0rc0 to 4.0.1rc0
CHANGELOG.md Added version 3.2.5 entry documenting the bug fix and build system improvements

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread VERSION
@sputhala-amd

Copy link
Copy Markdown

@anujshuk-amd Why do we need this compile time cmake flag TIMEMORY_MAX_STORAGE_THREADS? Do we plan to use this to build rocprofiler-systems with this flag passed to timemory during build time with a different fixed value?
Do we intend this to be a runtime flag configurable by users?

The code changes look good to me if we want this to be a cmake option.

@adjordje-amd adjordje-amd left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to reuse TIMEMORY_MAX_THREADS instead of introducing a new configurator? If TIMEMORY_MAX_STORAGE_THREADS represents the number of objects reserved for threads (each thread has its own storage), then the amount of storage is proportional to the thread count, right?
In other words, situation where TIMEMORY_MAX_STORAGE_THREADS != TIMEMORY_MAX_THREADS is invalid?

Comment thread CHANGELOG.md Outdated
@anujshuk-amd

Copy link
Copy Markdown
Author

Does it make sense to reuse TIMEMORY_MAX_THREADS instead of introducing a new configurator? If TIMEMORY_MAX_STORAGE_THREADS represents the number of objects reserved for threads (each thread has its own storage), then the amount of storage is proportional to the thread count, right? In other words, situation where TIMEMORY_MAX_STORAGE_THREADS != TIMEMORY_MAX_THREADS is invalid?

Let me put more insight of analyses, By keeping rocprofiler-sytem in mind -

  1. Regular threads: _id = 0, 1, 2, 3, ... (incremented from 0)
  2. Offset threads (internal): _id = max_threads-1, max_threads-2, ... (decremented from max_threads)

Since thread IDs can reach up to TIMEMORY_MAX_THREADS - 1, and these IDs index storage arrays of size TIMEMORY_MAX_STORAGE_THREADS, we MUST have: TIMEMORY_MAX_STORAGE_THREADS >= TIMEMORY_MAX_THREADS
Otherwise: array.at(thread_id) will throw std::out_of_range when thread_id >= TIMEMORY_MAX_STORAGE_THREADS.

No valid use case found yet for STORAGE > THREADS. Can't create thread IDs beyond max_threads, so extra storage slots are wasted memory
No valid use case for STORAGE < THREADS: Already prevented by CMake (causes runtime crashes)
Only valid state: STORAGE == THREADS (the default behavior)

Storage arrays are indexed by thread IDs. Thread IDs range from [[0, TIMEMORY_MAX_THREADS). Array must accommodate THREADS - 1 as maximum index Therefore, STORAGE must equal or greater than THREADS.

The dual-flag design can be improved using with a single TIMEMORY_MAX_THREADS configuration. The only thing is needed it should be power of two.

anujshuk-amd and others added 2 commits December 5, 2025 13:30
@anujshuk-amd anujshuk-amd changed the title Fix: Replace hard-coded thread limit with TIMEMORY_MAX_STORAGE_THREADS Fix: Replace hard-coded thread limit with TIMEMORY_MAX_THREADS Dec 5, 2025

@sputhala-amd sputhala-amd left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving from my side.

@anujshuk-amd

anujshuk-amd commented Dec 5, 2025 via email

Copy link
Copy Markdown
Author

@dgaliffiAMD dgaliffiAMD merged commit 29037f3 into ROCm:rocprofiler-systems Dec 8, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants