[mellanox_firmware] PCC collector (opt in)#104
Conversation
📝 WalkthroughWalkthroughThis pull request introduces PCC (Performance Counter Configuration) collection functionality to the Mellanox firmware suite. It adds a new Changes
Sequence Diagram(s)sequenceDiagram
participant CM as CollectorManager
participant PC as PccCollector
participant Tool as MlxregTool/<br/>MstregTool
participant Device as Device
participant FS as Filesystem
CM->>CM: collect_pcc_info() invoked
CM->>CM: Check if "pcc" option enabled
alt pcc enabled
loop for each device context
CM->>PC: PccCollector instantiated with context
PC->>Tool: _collect_with_mft()/mstflint()
Tool->>Device: ppcc_get(GET_ALGO_INFO_ARRAY)
Device-->>Tool: algo slots returned
Tool-->>PC: algo list parsed
loop for each algo slot
PC->>Tool: ppcc_get(GET_ALGO_STATUS)
Device-->>Tool: status response
Tool-->>PC: status value evaluated
alt counter_en is true
PC->>Tool: ppcc_get(GET_COUNTER_INFO, ...)
Device-->>Tool: counter metadata
Tool-->>PC: metadata collected
end
PC->>Tool: ppcc_get(GET_PARAM_INFO, ...)
Device-->>Tool: parameter metadata
Tool-->>PC: parameters collected
end
PC->>FS: Write collected PPCC data to files
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.5)sos/report/mellanox_firmware_suite/tools/MFT/mlxreg.pysos/report/mellanox_firmware_suite/tools/base_tool.pysos/report/mellanox_firmware_suite/collectors/pcc_collector.py
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sos/report/mellanox_firmware_suite/tools/base_tool.py`:
- Around line 49-52: The cache early-return in execute_cmd is skipping the
required side-effect of writing artifacts when filename or subdir is provided;
modify execute_cmd so that when filename or subdir is set it does not
short-circuit on a cache hit but still calls _collect_cmd_output (or an
equivalent helper) to create/write the file into the requested subdir, while
still returning the cached stdout when appropriate; use the existing
cache_key/self.ctx.cache logic to retrieve stdout but ensure
_collect_cmd_output(cmd, output, filename, subdir) is invoked before returning
if filename or subdir are non-empty.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cf45634c-fd07-4972-a83c-11ebe05c891d
📒 Files selected for processing (6)
sos/report/mellanox_firmware_suite/collectors/collector_manager.pysos/report/mellanox_firmware_suite/collectors/pcc_collector.pysos/report/mellanox_firmware_suite/tools/MFT/mlxreg.pysos/report/mellanox_firmware_suite/tools/MSTFlint/mstreg.pysos/report/mellanox_firmware_suite/tools/base_tool.pysos/report/plugins/mellanox_firmware.py
| if get_cached and cache_key in self.ctx.cache: | ||
| return self.ctx.cache[cache_key] | ||
|
|
||
| rc, output = self._run_command(cmd, timeout, filename) | ||
| rc, output = self._run_command(cmd, timeout, filename, subdir=subdir) |
There was a problem hiding this comment.
Don't short-circuit archival calls on cache hits.
When filename/subdir is provided, execute_cmd() has a required side effect: it must write the collected artifact. The cache guard returns before _collect_cmd_output() runs, so a repeated call for the same command can hand back cached stdout and never create the requested file in the requested subdirectory.
Suggested fix
- if get_cached and cache_key in self.ctx.cache:
+ if filename is None and get_cached and cache_key in self.ctx.cache:
return self.ctx.cache[cache_key]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if get_cached and cache_key in self.ctx.cache: | |
| return self.ctx.cache[cache_key] | |
| rc, output = self._run_command(cmd, timeout, filename) | |
| rc, output = self._run_command(cmd, timeout, filename, subdir=subdir) | |
| if filename is None and get_cached and cache_key in self.ctx.cache: | |
| return self.ctx.cache[cache_key] | |
| rc, output = self._run_command(cmd, timeout, filename, subdir=subdir) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sos/report/mellanox_firmware_suite/tools/base_tool.py` around lines 49 - 52,
The cache early-return in execute_cmd is skipping the required side-effect of
writing artifacts when filename or subdir is provided; modify execute_cmd so
that when filename or subdir is set it does not short-circuit on a cache hit but
still calls _collect_cmd_output (or an equivalent helper) to create/write the
file into the requested subdir, while still returning the cached stdout when
appropriate; use the existing cache_key/self.ctx.cache logic to retrieve stdout
but ensure _collect_cmd_output(cmd, output, filename, subdir) is invoked before
returning if filename or subdir are non-empty.
Add a new PCC collector.
This collection is disabled by default and requires adding "-k mellanox_firmware.pcc=true" to the command line to enable it.
Key changes:
Signed-off-by: Dan Goldberg dgoldberg@nvidia.com
Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines
Summary by CodeRabbit