Skip to content

Show VCN and JPEG busy values where VCN/JPEG activity is not supported#1

Open
sputhala-amd wants to merge 7 commits into
amd-stagingfrom
xcpValues
Open

Show VCN and JPEG busy values where VCN/JPEG activity is not supported#1
sputhala-amd wants to merge 7 commits into
amd-stagingfrom
xcpValues

Conversation

@sputhala-amd

@sputhala-amd sputhala-amd commented Jun 6, 2025

Copy link
Copy Markdown
Owner

On AMD-SMI in rocm 7.0 vcn_activity and jpeg_activity will not be reported where XCP(partition) stats vcn_busy and jpeg_busy are available.

This causes the activity tracking to fail. The fix is to read the busy values when activity values are not supported.

Summary by CodeRabbit

  • New Features

    • Improved handling and reporting of VCN and JPEG activity metrics for supported devices, with enhanced support for multiple XCP metrics and clearer metric naming in performance traces.
  • Refactor

    • Unified VCN and JPEG activity metrics management into a single structure for more consistent data collection and processing.
    • Enhanced GPU initialization logic to ensure AMD SMI is initialized only once, improving stability and efficiency.

@coderabbitai

coderabbitai Bot commented Jun 6, 2025

Copy link
Copy Markdown

Walkthrough

The changes refactor the handling of VCN and JPEG GPU activity metrics in the AMD SMI data collection, unifying them under a new xcp_metrics_t structure. Conditional logic is introduced to accommodate device capabilities, and the tracing logic is updated to support multiple XCP metrics with improved naming and indexing. A clarifying comment was added in gpu.cpp. AMD SMI initialization was made thread-safe using std::call_once.

Changes

File(s) Change Summary
source/lib/core/gpu.cpp Made AMD SMI initialization thread-safe with std::once_flag and std::call_once; added comment about AMD SMI not reporting VCN_activity and JPEG_activity if VCN_busy or JPEG_busy fields exist; refactored GPU metric support detection using a helper lambda.
source/lib/rocprof-sys/library/amd_smi.hpp Added nested struct xcp_metrics_t with vcn_busy and jpeg_busy vectors inside data; replaced separate m_vcn_metrics and m_jpeg_metrics vectors with a single m_xcp_metrics vector.
source/lib/rocprof-sys/library/amd_smi.cpp Refactored VCN/JPEG metrics sampling and post-processing to use xcp_metrics_t and conditional logic for device support; unified metric collection with helper lambdas; updated perfetto tracing and indexing logic to support multiple XCP metrics with appropriate naming and indexing.

Sequence Diagram(s)

sequenceDiagram
    participant Device
    participant DataSampler
    participant MetricsStore
    participant Tracer

    Device->>DataSampler: Request sample (with dev_id)
    DataSampler->>Device: Check VCN/JPEG support
    alt VCN/JPEG supported
        DataSampler->>MetricsStore: Collect VCN/JPEG metrics into single xcp_metrics_t
    else VCN/JPEG not supported
        loop For each XCP
            DataSampler->>MetricsStore: Collect VCN/JPEG busy metrics per XCP
        end
    end
    DataSampler->>Tracer: Post-process and trace metrics with correct naming/indexing
Loading

Poem

In the garden of metrics, a change takes root,
VCN and JPEG now walk in a unified suit.
With xcp_metrics_t, they hop side by side,
Through conditional logic, their values reside.
Tracing their journey, the indices align—
A bunny’s delight in a codebase refined! 🐇


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e095865 and 543f26f.

📒 Files selected for processing (1)
  • source/lib/rocprof-sys/library/amd_smi.cpp (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/lib/rocprof-sys/library/amd_smi.cpp
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sputhala-amd sputhala-amd changed the title Show VCN and JPEG busy values where VCN/JPEG activity is not supported. Show VCN and JPEG busy values where VCN/JPEG activity is not supported.@coderabbitai Jun 6, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
source/lib/rocprof-sys/library/amd_smi.cpp (2)

555-568: Complex index calculation needs documentation.

The index calculation for JPEG metrics when VCN is also enabled involves summing VCN metrics across all XCPs. While the logic appears correct, it's complex and could benefit from clarification.

Add a comment explaining the index calculation:

 if(_settings.vcn_activity)
 {
+    // When both VCN and JPEG are enabled, JPEG indices start after all VCN metrics.
+    // Calculate the total number of VCN metrics across all XCPs to offset JPEG indices.
     size_t total_vcn_metrics = 0;
     if(gpu::is_vcn_activity_supported(_dev_id))
     {
         total_vcn_metrics = itr.m_xcp_metrics[0].vcn_busy.size();
     }
     else
     {
         for(const auto& temp : itr.m_xcp_metrics)
             total_vcn_metrics += temp.vcn_busy.size();
     }
     idx += (total_vcn_metrics - 1);
 }

194-263: Consider validating metric array consistency.

The code assumes that when activity metrics are not supported, the XCP busy metrics will be available and valid. Consider adding validation to handle edge cases where neither metric type is available.

Add validation after metric collection:

 if(get_settings(m_dev_id).vcn_activity)
 {
     // ... existing VCN collection logic ...
+    
+    // Validate that we collected at least some metrics
+    if(m_xcp_metrics.empty() || 
+       std::all_of(m_xcp_metrics.begin(), m_xcp_metrics.end(), 
+                   [](const auto& m) { return m.vcn_busy.empty(); }))
+    {
+        ROCPROFSYS_VERBOSE_F(1, "No valid VCN metrics collected for device %u\n", _dev_id);
+        get_settings(m_dev_id).vcn_activity = false;
+    }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fcd8cc and de15535.

📒 Files selected for processing (4)
  • source/lib/core/gpu.cpp (2 hunks)
  • source/lib/core/gpu.hpp (1 hunks)
  • source/lib/rocprof-sys/library/amd_smi.cpp (4 hunks)
  • source/lib/rocprof-sys/library/amd_smi.hpp (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
source/lib/core/gpu.hpp (1)
source/lib/core/gpu.cpp (10)
  • get_processor_handles (211-304)
  • get_processor_handles (212-212)
  • is_vcn_activity_supported (306-311)
  • is_vcn_activity_supported (307-307)
  • is_jpeg_activity_supported (313-318)
  • is_jpeg_activity_supported (314-314)
  • is_vcn_busy_supported (320-325)
  • is_vcn_busy_supported (321-321)
  • is_jpeg_busy_supported (327-332)
  • is_jpeg_busy_supported (328-328)
🔇 Additional comments (8)
source/lib/rocprof-sys/library/amd_smi.hpp (2)

96-100: LGTM! Well-structured consolidation of metrics.

The introduction of the nested xcp_metrics_t struct effectively consolidates VCN and JPEG busy metrics into a logical grouping, which aligns well with the PR's objective of handling XCP-specific metrics.


115-115: Good refactoring to support multiple XCP metrics.

Replacing the separate VCN and JPEG metric vectors with a single vector of xcp_metrics_t provides better structure for handling devices with multiple XCP partitions.

source/lib/core/gpu.hpp (2)

43-53: LGTM! Clear and consistent API for feature support queries.

The four new boolean query functions provide a clean interface for checking device-specific support for VCN and JPEG activity/busy metrics.


59-62: Well-designed static storage for device capabilities.

The static boolean vectors efficiently track per-device support status, maintaining consistency with the existing processor tracking pattern.

source/lib/core/gpu.cpp (2)

253-296: LGTM! Comprehensive metric support detection logic.

The implementation correctly detects support by checking for valid values (not UINT16_MAX) in both activity and XCP busy metrics. The nested loops properly handle the different metric array structures.


309-310: Add consistent bounds checking across all support query functions.

Good defensive programming with the bounds checking. This prevents potential out-of-bounds access if an invalid device ID is provided.

Also applies to: 316-317, 323-324, 330-331

source/lib/rocprof-sys/library/amd_smi.cpp (2)

192-215: LGTM! Correct VCN metric collection with proper branching.

The logic correctly handles both cases:

  • When VCN activity is supported: collects from vcn_activity array
  • When not supported: collects from XCP-specific vcn_busy arrays

The use of UINT16_MAX as a sentinel value for invalid/unsupported metrics is consistent with the AMD SMI API.


417-430: LGTM! Well-structured metric name formatting.

The addendum_blk lambda effectively handles the different naming conventions for regular metrics vs XCP-specific busy metrics, providing clear identification in the trace output.

Comment thread source/lib/rocprof-sys/library/amd_smi.cpp Outdated
On AMD-SMI in rocm 7.0 vcn_activity and jpeg_activity will not be reported
where XCP(partition) stats vcn_busy and jpeg_busy are available.

This causes the activity tracking to fail. The fix is to read the busy values
when activity values are not supported.
@coderabbitai coderabbitai Bot changed the title Show VCN and JPEG busy values where VCN/JPEG activity is not supported.@coderabbitai Show VCN and JPEG busy values where VCN/JPEG activity is not supported.Refactor AMD SMI XCP metrics handling and update tracing logic Jun 10, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
source/lib/rocprof-sys/library/amd_smi.cpp (1)

253-260: Array bounds issue flagged in previous review.

This is the same array bounds issue identified in the previous review comment. The code assumes _gpu_metrics.xcp_stats[i] exists for all i < m_xcp_metrics.size() without proper bounds checking.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de15535 and caaed26.

📒 Files selected for processing (3)
  • source/lib/core/gpu.cpp (0 hunks)
  • source/lib/rocprof-sys/library/amd_smi.cpp (4 hunks)
  • source/lib/rocprof-sys/library/amd_smi.hpp (2 hunks)
💤 Files with no reviewable changes (1)
  • source/lib/core/gpu.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/lib/rocprof-sys/library/amd_smi.hpp
🔇 Additional comments (4)
source/lib/rocprof-sys/library/amd_smi.cpp (4)

192-215: LGTM! Clean conditional logic for VCN activity handling.

The conditional logic correctly handles both supported and unsupported VCN activity scenarios. The fallback to XCP stats when activity metrics are unavailable addresses the ROCm 7.0 issue described in the PR objectives.


417-429: LGTM! Improved naming convention for XCP metrics.

The updated addendum_blk function provides better naming and indexing for XCP metrics, distinguishing between busy and activity metrics with appropriate XCP indexing.


217-232: ⚠️ Potential issue

Verify empty container access safety.

The code accesses m_xcp_metrics[0] without checking if the container is empty, which could lead to undefined behavior.

Add bounds checking before accessing the container:

         if(m_xcp_metrics.empty())
             m_xcp_metrics.push_back(metrics);
         else
+        {
+            if(m_xcp_metrics.empty())
+                m_xcp_metrics.push_back(xcp_metrics_t{});
             m_xcp_metrics[0].jpeg_busy = metrics.jpeg_busy;
+        }

Likely an incorrect or invalid review comment.


555-568:

Details

✅ Verification successful

Verify index calculation consistency.

The complex index calculation for JPEG metrics needs verification to ensure it matches the counter track creation logic. The calculation depends on VCN metrics count and could be brittle.

Run the following script to verify the consistency between counter track creation and tracing index calculations:


🏁 Script executed:

#!/bin/bash
# Description: Analyze index calculation patterns for potential mismatches

# Search for counter track creation patterns
echo "=== Counter Track Creation Patterns ==="
rg -A 10 -B 2 "counter_track::emplace.*VCN.*Activity"

echo -e "\n=== Counter Track Creation Patterns for JPEG ==="
rg -A 10 -B 2 "counter_track::emplace.*JPEG.*Activity"

echo -e "\n=== Tracing Index Usage Patterns ==="
rg -A 5 -B 5 "TRACE_COUNTER.*device_vcn_activity"

echo -e "\n=== JPEG Index Calculation ==="
rg -A 10 -B 5 "total_vcn_metrics.*vcn_busy\.size"

Length of output: 4448


🏁 Script executed:

#!/bin/bash
# Search for all counter_track::emplace calls to correlate track registration and idx offsets
rg -n "counter_track::emplace" -A5 -B5 source/lib/rocprof-sys/library/amd_smi.cpp

Length of output: 3273


🏁 Script executed:

#!/bin/bash
# Print the full TRACE_COUNTER section around VCN and JPEG to verify idx handling
sed -n '500,650p' source/lib/rocprof-sys/library/amd_smi.cpp

Length of output: 5933


🏁 Script executed:

#!/bin/bash
# Print the section where `_idx` is declared and populated
sed -n '350,500p' source/lib/rocprof-sys/library/amd_smi.cpp

Length of output: 6988


Index calculation for JPEG metrics verified
The adjustment

idx  = _idx.at(7);  
if(_settings.vcn_activity)  
  idx += (total_vcn_metrics - 1);  

correctly re-bases JPEG’s start index to immediately follow the VCN traces (_idx[6] + total_vcn_metrics). This holds for both the “supported” (_vcn_metrics[0].vcn_busy.size()) and “fallback” (sum of each temp.vcn_busy.size()) paths. No change required.

Comment thread source/lib/rocprof-sys/library/amd_smi.cpp Outdated
Comment thread source/lib/rocprof-sys/library/amd_smi.cpp Outdated
Comment thread source/lib/rocprof-sys/library/amd_smi.cpp Outdated
Comment thread source/lib/rocprof-sys/library/amd_smi.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
source/lib/rocprof-sys/library/amd_smi.cpp (1)

437-458: Previously raised bounds checking concerns remain valid.

Multiple locations access itr.m_xcp_metrics[0] without verifying the container is not empty. While the learning indicates that VCN activity support guarantees at least one row, this only applies to the supported case. The unsupported paths and JPEG handling still require bounds checking.

Also applies to: 462-481, 515-539, 560-584

🧹 Nitpick comments (1)
source/lib/rocprof-sys/library/amd_smi.cpp (1)

545-558: Consider extracting VCN metrics counting into a helper function.

The index calculation logic for JPEG counters when VCN is also active is complex and could benefit from being extracted into a separate function for better readability and maintainability.

Consider creating a helper function:

size_t count_total_vcn_metrics(const data& itr, uint32_t dev_id) {
    size_t total = 0;
    if(gpu::is_vcn_activity_supported(dev_id)) {
        total = itr.m_xcp_metrics.empty() ? 0 : itr.m_xcp_metrics[0].vcn_busy.size();
    } else {
        for(const auto& temp : itr.m_xcp_metrics)
            total += temp.vcn_busy.size();
    }
    return total;
}

Then simplify the main code:

-                    size_t total_vcn_metrics = 0;
-                    if(gpu::is_vcn_activity_supported(_dev_id))
-                    {
-                        total_vcn_metrics = itr.m_xcp_metrics[0].vcn_busy.size();
-                    }
-                    else
-                    {
-                        for(const auto& temp : itr.m_xcp_metrics)
-                            total_vcn_metrics += temp.vcn_busy.size();
-                    }
+                    size_t total_vcn_metrics = count_total_vcn_metrics(itr, _dev_id);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between caaed26 and 0db6283.

📒 Files selected for processing (2)
  • source/lib/core/gpu.cpp (1 hunks)
  • source/lib/rocprof-sys/library/amd_smi.cpp (4 hunks)
✅ Files skipped from review due to trivial changes (1)
  • source/lib/core/gpu.cpp
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:447-468
Timestamp: 2025-06-10T02:42:26.687Z
Learning: When VCN activity is supported by AMD GPUs in the AMD SMI API, there will always be at least one row of values provided, making bounds checking for `m_xcp_metrics[0]` unnecessary in the supported case.
source/lib/rocprof-sys/library/amd_smi.cpp (1)
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:447-468
Timestamp: 2025-06-10T02:42:26.687Z
Learning: When VCN activity is supported by AMD GPUs in the AMD SMI API, there will always be at least one row of values provided, making bounds checking for `m_xcp_metrics[0]` unnecessary in the supported case.
🔇 Additional comments (3)
source/lib/rocprof-sys/library/amd_smi.cpp (3)

192-199: Well-structured helper function for filtering invalid metrics.

The fill_busy_metrics lambda effectively filters out invalid UINT16_MAX values, ensuring only valid metrics are collected.


201-218: Clean refactoring of VCN metrics collection.

The conditional handling based on is_vcn_activity_supported provides a clear separation between unified and XCP-specific metrics collection.


247-250: ⚠️ Potential issue

Add bounds checking for XCP stats array access.

The code accesses _gpu_metrics.xcp_stats[i] without verifying that i is within the bounds of the XCP stats array. This could lead to undefined behavior if m_xcp_metrics has more elements than _gpu_metrics.xcp_stats.

Apply this diff to add bounds checking:

             else
             {
                 // Add JPEG busy to existing metrics (one per XCP)
-                for(size_t i = 0; i < m_xcp_metrics.size() && i < std::size(_gpu_metrics.xcp_stats); ++i)
+                size_t xcp_count = std::min(m_xcp_metrics.size(), 
+                                          std::size(_gpu_metrics.xcp_stats));
+                for(size_t i = 0; i < xcp_count; ++i)
                 {
                     fill_busy_metrics(m_xcp_metrics[i].jpeg_busy, _gpu_metrics.xcp_stats[i].jpeg_busy);
                 }
             }

Likely an incorrect or invalid review comment.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
source/lib/rocprof-sys/library/amd_smi.cpp (1)

556-599: Consider extracting index calculation logic for better maintainability.

The JPEG tracing logic is correct but contains complex index calculations to offset by VCN metrics count. While functionally sound, this could benefit from extraction into a helper function.

Consider extracting the VCN metrics counting logic:

+            auto calculate_vcn_metrics_count = [&]() -> size_t {
+                if(!_settings.vcn_activity) return 0;
+                if(gpu::is_vcn_activity_supported(_dev_id))
+                    return itr.m_xcp_metrics[0].vcn_busy.size();
+                else {
+                    size_t total = 0;
+                    for(const auto& temp : itr.m_xcp_metrics)
+                        total += temp.vcn_busy.size();
+                    return total;
+                }
+            };
+
             if(_settings.jpeg_activity && !itr.m_xcp_metrics.empty())
             {
                 uint64_t idx = _idx.at(7);
-                if(_settings.vcn_activity)
-                {
-                    size_t total_vcn_metrics = 0;
-                    if(gpu::is_vcn_activity_supported(_dev_id))
-                    {
-                        total_vcn_metrics = itr.m_xcp_metrics[0].vcn_busy.size();
-                    }
-                    else
-                    {
-                        for(const auto& temp : itr.m_xcp_metrics)
-                            total_vcn_metrics += temp.vcn_busy.size();
-                    }
-                    idx += (total_vcn_metrics - 1);
-                }
+                size_t total_vcn_metrics = calculate_vcn_metrics_count();
+                if(total_vcn_metrics > 0)
+                    idx += (total_vcn_metrics - 1);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ede64a and 4b71515.

📒 Files selected for processing (2)
  • source/lib/core/gpu.cpp (1 hunks)
  • source/lib/rocprof-sys/library/amd_smi.cpp (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/lib/core/gpu.cpp
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:462-481
Timestamp: 2025-06-10T12:25:14.058Z
Learning: When JPEG activity is supported by AMD GPUs in the AMD SMI API, the sampling logic ensures there will always be at least one element in m_xcp_metrics, making bounds checking for m_xcp_metrics[0] unnecessary in the supported case.
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:447-468
Timestamp: 2025-06-10T02:42:26.687Z
Learning: When VCN activity is supported by AMD GPUs in the AMD SMI API, there will always be at least one row of values provided, making bounds checking for `m_xcp_metrics[0]` unnecessary in the supported case.
source/lib/rocprof-sys/library/amd_smi.cpp (2)
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:462-481
Timestamp: 2025-06-10T12:25:14.058Z
Learning: When JPEG activity is supported by AMD GPUs in the AMD SMI API, the sampling logic ensures there will always be at least one element in m_xcp_metrics, making bounds checking for m_xcp_metrics[0] unnecessary in the supported case.
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:447-468
Timestamp: 2025-06-10T02:42:26.687Z
Learning: When VCN activity is supported by AMD GPUs in the AMD SMI API, there will always be at least one row of values provided, making bounds checking for `m_xcp_metrics[0]` unnecessary in the supported case.
🔇 Additional comments (7)
source/lib/rocprof-sys/library/amd_smi.cpp (7)

192-198: LGTM! Clean helper function implementation.

The fill_busy_metrics lambda effectively abstracts the filtering of invalid UINT16_MAX values, improving code reusability and readability throughout the sampling logic.


200-217: LGTM! Well-structured conditional logic for VCN metrics.

The implementation correctly handles both supported and unsupported VCN activity cases. The fallback to XCP-specific busy values when activity is not supported is properly implemented with appropriate empty checks.


219-255: LGTM! Comprehensive JPEG metrics handling with proper bounds checking.

The JPEG activity sampling logic correctly handles the integration with existing VCN metrics and includes proper bounds checking for XCP stats access on lines 246-248. The conditional logic appropriately handles both supported and unsupported cases.


409-422: LGTM! Enhanced lambda with proper parameterization.

The addendum_blk lambda enhancement to support XCP indexing and busy flags provides the necessary flexibility for generating appropriate metric names across different device configurations.


439-466: LGTM! Robust counter track creation for VCN metrics.

The implementation properly handles both activity-supported and XCP-specific cases with appropriate empty checks and nested iteration for multiple XCPs.


470-496: LGTM! Consistent JPEG counter track creation.

The JPEG counter track creation follows the same robust pattern as VCN metrics, with proper empty checks and conditional logic based on device support.


526-554: LGTM! Clean VCN activity tracing implementation.

The tracing logic correctly handles both supported and unsupported cases with proper iteration and indexing. Based on the retrieved learnings, the access to itr.m_xcp_metrics[0] is safe when VCN activity is supported.

* Fix the AMD-SMI double initialization
* Optimize VCN/JPEG sampling to store both values always
@sputhala-amd sputhala-amd changed the title Show VCN and JPEG busy values where VCN/JPEG activity is not supported.Refactor AMD SMI XCP metrics handling and update tracing logic Show VCN and JPEG busy values where VCN/JPEG activity is not supported Jun 12, 2025
@sputhala-amd sputhala-amd marked this pull request as draft June 12, 2025 18:13
@sputhala-amd sputhala-amd marked this pull request as ready for review June 12, 2025 18:13
@sputhala-amd

sputhala-amd commented Jun 12, 2025

Copy link
Copy Markdown
Owner Author

@coderabbitai, I pushed a fix in commit 5a90d9a , please review it.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
source/lib/core/gpu.cpp (1)

90-108: Unused init_flag – remove or wire it into std::call_once
static std::once_flag init_flag; is declared, but never used.
Either drop the variable or guard the lambda with it:

-bool
-amdsmi_init()
-{
-    static std::once_flag init_flag;  // Ensures initialization happens only once
-    auto                  _amdsmi_init   = []() {
+bool
+amdsmi_init()
+{
+    static std::once_flag init_flag;
+    std::call_once(init_flag, []() {
         try
         {
             ROCPROFSYS_AMD_SMI_CALL(::amdsmi_init(AMDSMI_INIT_AMD_GPUS));
             get_processor_handles();
-            _amdsmi_is_initialized() = true;  // Mark as initialized
+            _amdsmi_is_initialized() = true;
         } catch(std::exception& _e)
         {
             ROCPROFSYS_BASIC_VERBOSE(1, "Exception thrown initializing amd-smi: %s\n",
                                      _e.what());
             _amdsmi_is_initialized() = false;
         }
-        return true;
-    }();
-
-    return _amdsmi_init;
+    });
+    return _amdsmi_is_initialized();
 }

This eliminates dead code and matches the comment.

🧹 Nitpick comments (2)
source/lib/core/gpu.cpp (2)

76-85: std::once_flag and init-state helper look good, but consider using std::atomic_bool for visibility
_amdsmi_is_initialized() returns a plain bool&. After std::call_once completes, all threads will have the necessary happens-before relation, so it is technically safe, but a non-atomic global boolean can still be surprising.
A cheap std::atomic_bool (or even std::memory_order_relaxed) removes any doubt and makes intent explicit.


271-299: Metric-detection logic is clearer, but double-checks could be simplified
The has_valid lambda is a tidy replacement for explicit loops.
Two minor nits:

  1. v_busy_supported / j_busy_supported are toggled inside the loop; you can break immediately after both become true to avoid unnecessary iterations (micro-optimisation).
  2. Given the comment that activity is absent when busy is present, callers may want a single helper that picks whichever is supported (activity || busy). Otherwise every caller must repeat that logic.

No action required, just flagging for future polish.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b71515 and d5ce1fe.

📒 Files selected for processing (2)
  • source/lib/core/gpu.cpp (3 hunks)
  • source/lib/rocprof-sys/library/amd_smi.cpp (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/lib/rocprof-sys/library/amd_smi.cpp
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:462-481
Timestamp: 2025-06-10T12:25:14.111Z
Learning: When JPEG activity is supported by AMD GPUs in the AMD SMI API, the sampling logic ensures there will always be at least one element in m_xcp_metrics, making bounds checking for m_xcp_metrics[0] unnecessary in the supported case.
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:447-468
Timestamp: 2025-06-10T02:42:26.719Z
Learning: When VCN activity is supported by AMD GPUs in the AMD SMI API, there will always be at least one row of values provided, making bounds checking for `m_xcp_metrics[0]` unnecessary in the supported case.
🔇 Additional comments (1)
source/lib/core/gpu.cpp (1)

160-163: Nice use of std::call_once in the public entry point
Wrapping the internal amdsmi_init() with amdsmi_once cleanly externalises the one-time initialisation. 👍

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
source/lib/rocprof-sys/library/amd_smi.cpp (1)

395-408: addendum_blk is doing two jobs – split for clarity

The lambda now formats both plain device-level metrics and per-XCP metrics.
A tiny helper that just prints the index block ("[02]") would let you compose
the final string with simple JOINs, remove the nested if/else, and avoid
the duplicated JOIN("", '[', (_i < 10 ? "0" : ""), _i, ']') logic.

Not urgent, but will pay dividends when another metric type shows up.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5ce1fe and 0dc553f.

📒 Files selected for processing (2)
  • source/lib/core/gpu.cpp (4 hunks)
  • source/lib/rocprof-sys/library/amd_smi.cpp (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/lib/core/gpu.cpp
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:462-481
Timestamp: 2025-06-10T12:25:14.111Z
Learning: When JPEG activity is supported by AMD GPUs in the AMD SMI API, the sampling logic ensures there will always be at least one element in m_xcp_metrics, making bounds checking for m_xcp_metrics[0] unnecessary in the supported case.
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:447-468
Timestamp: 2025-06-10T02:42:26.719Z
Learning: When VCN activity is supported by AMD GPUs in the AMD SMI API, there will always be at least one row of values provided, making bounds checking for `m_xcp_metrics[0]` unnecessary in the supported case.
source/lib/rocprof-sys/library/amd_smi.cpp (2)
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:462-481
Timestamp: 2025-06-10T12:25:14.111Z
Learning: When JPEG activity is supported by AMD GPUs in the AMD SMI API, the sampling logic ensures there will always be at least one element in m_xcp_metrics, making bounds checking for m_xcp_metrics[0] unnecessary in the supported case.
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:447-468
Timestamp: 2025-06-10T02:42:26.719Z
Learning: When VCN activity is supported by AMD GPUs in the AMD SMI API, there will always be at least one row of values provided, making bounds checking for `m_xcp_metrics[0]` unnecessary in the supported case.
🔇 Additional comments (1)
source/lib/rocprof-sys/library/amd_smi.cpp (1)

232-239: Confirm XCP loop bound is correct for all firmware versions

for(const auto& xcp : _gpu_metrics.xcp_stats) iterates the full fixed-length
array returned by the driver. On some cards only xcp_stat_num (or a similarly
named field) entries are valid, the rest are all UINT16_MAX.
You currently rely on the fill_busy_metrics filter to prune invalid values,
which is fine, but double-checking the driver header for an explicit “count”
field and respecting it would avoid work and future surprises.

Can you verify whether amdsmi_gpu_metrics_t exposes such a field and, if it
does, slice the loop accordingly?

Comment thread source/lib/rocprof-sys/library/amd_smi.cpp Outdated
sputhala-amd pushed a commit that referenced this pull request Jan 22, 2026
 rocprof-sys-instrument to avoid instrumenting around C "main" wrapper (#1322)

* Add check for Fortran main

* Comment change

* MAIN__ -> Fortran main

* Cray Compiler comment change

* Add changelog and troubleshooting comments

* Improve CHANGELOG.md message

* Change CHANGELOG msg to be in 7.2.0

* Apply review change #1

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply review change #2

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply review change #3

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
[rocm-systems] ROCm/rocm-systems#1322 (commit f0a41b6)
sputhala-amd pushed a commit that referenced this pull request Jan 22, 2026
 =?UTF-8?q?=20Perfetto=20annotations=20to=20match=20new=20n=E2=80=A6=20(#1?=
 =?UTF-8?q?618)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Rename "corr_id" to "stack_id" in Perfetto annotations to match new naming in schema.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* correlation_id.ancestor was not added until ROCPROFILER_VERSION 1.0
[rocm-systems] ROCm/rocm-systems#1618 (commit 4b0fb2c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants