Skip to content

[PC Sampling][GFX12][Draft] PC Sampling Support#132

Closed
rocm-devops wants to merge 1 commit into
amd-stagingfrom
vlaindic/gfx12-pcs
Closed

[PC Sampling][GFX12][Draft] PC Sampling Support#132
rocm-devops wants to merge 1 commit into
amd-stagingfrom
vlaindic/gfx12-pcs

Conversation

@rocm-devops

Copy link
Copy Markdown

PR Details

Initial GFX12 PC sampling support. The following is done as part of this PR:

  • Only host-trap is enabled, as the perf_snapshot design is broken.
  • Still, we adapted parser's tests to verify stochastic data, as majority of code will be reused for later architecture.
  • Most of our integration tests were tied to gfx9 and 64 lane wide wavefront. This PR is using a wave_size (32 or 64) that depends on the underlying arch.
  • As Navi4x is a smaller chip than MI300, we had to redude the workload size for some tests, as they would need a long time to run on Navi4x
    • They would outreach the timeout even without PC sampling being used.

Notes

Associated Jira Ticket Number/Link

SWDEV-481151

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Continuous Integration

Technical details

Added/updated tests?

  • Yes
  • No, Does not apply to this PR.

Updated CHANGELOG?

  • Yes
  • No, Does not apply to this PR.

Added/Updated documentation?

  • Yes
  • No, Does not apply to this PR.

The initial host-trap PC sampling support in SDK and V3.
Introducing parser tests specific to GFX12.
Reducing testing worklod to run on Navi4x in reasonable timeframe.
@rocm-devops

Copy link
Copy Markdown
Author

Code Coverage Report

Code Coverage Report

Tests Only

code coverage tests.png

Samples Only

code coverage samples.png

Tests + Samples

code coverage all.png

@amd-hsivasun

Copy link
Copy Markdown

Imported to rocm-systems

ammallya pushed a commit that referenced this pull request Oct 28, 2025
…#132)

* [SWDEV-509876] Remove buffer requirement from device counting service

No longer require a buffer to be given when setting up device counting
service. This is to reduce performance overhead in cases where immediate
return of counting samples is being used (synchronous mode).

* Missed file

* Update source/include/rocprofiler-sdk/device_counting_service.h

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Update source/lib/rocprofiler-sdk/counters/controller.cpp

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Update source/lib/rocprofiler-sdk/counters/device_counting.cpp

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Fixes for build

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>

[ROCm/rocprofiler-sdk commit: 0c4a56c]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for peer review PR needs initial review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants