From a646c14277d0d6875baaf16992fbb57516b567eb Mon Sep 17 00:00:00 2001 From: Michael Benavidez Date: Tue, 2 Jun 2026 16:48:09 -0500 Subject: [PATCH 1/2] Fix: Resolve sphinx warnings --- docs/common/system-validation.md | 22 +++++++++++----------- docs/conf.py | 4 +++- docs/reference/related-documentation.md | 2 +- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/docs/common/system-validation.md b/docs/common/system-validation.md index 0385a96..1180e01 100644 --- a/docs/common/system-validation.md +++ b/docs/common/system-validation.md @@ -47,7 +47,7 @@ The `rvs` has two different types of modules to validate the Compute subsystem. - [Properties](#gpu-properties) - [Benchmark / Stress / Qualification](#benchmark-stress-qualification) -MI300X GPU accelerators have many architectural features. Similar to the [Check GPU presence (lspci)](../mi300x/health-checks.md#check-gpu-presence-lspci) section, `rvs` has an option to display all MI300X GPU accelerators present in the SUT. Before +MI300X GPU accelerators have many architectural features. Similar to the [Check GPU presence (lspci)](health-checks.md#check-gpu-presence) section, `rvs` has an option to display all MI300X GPU accelerators present in the SUT. Before proceeding with the modules below, run the following command to make sure all the GPUs are seen with their correct PCIe properties. Command: @@ -380,7 +380,7 @@ grep "bandwidth" mem.txt #### BABEL -Refer to the [BabelStream section](mi300x-bench-babelstream.md) for instructions on how to run this module to test memory. +Refer to the [BabelStream section](#babelstream) for instructions on how to run this module to test memory. ### IO @@ -870,10 +870,10 @@ For comprehensive instructions, test scope, and result interpretation, refer to High level test summary: -- **PCIe Subsystem:** Tests PCIe link status, speed, width, and stress bandwidth (host-to-device, device-to-host, and bidirectional). -- **Memory Subsystem:** Exercises and validates HBM (High Bandwidth Memory) through stress tests such as bandwidth, dual stream, and random access patterns. -- **Compute Subsystem:** Runs compute kernels at various data types and loads, verifying the stability and peak capability of the GPU compute units. -- **Power and Thermal:** Max power and sustained stress kernels help uncover errors that show up under load. +- **PCIe Subsystem:** Tests PCIe link status, speed, width, and stress bandwidth (host-to-device, device-to-host, and bidirectional). +- **Memory Subsystem:** Exercises and validates HBM (High Bandwidth Memory) through stress tests such as bandwidth, dual stream, and random access patterns. +- **Compute Subsystem:** Runs compute kernels at various data types and loads, verifying the stability and peak capability of the GPU compute units. +- **Power and Thermal:** Max power and sustained stress kernels help uncover errors that show up under load. Extended information: @@ -929,9 +929,9 @@ Program exiting with return code AGFHC_SUCCESS [0] This test should be run twice to better exercise the HBM memory ensuring no ECC exceptions are present. ``` -- **HBM Bandwidth:** Measures and stresses memory read/write throughput. -- **HBM Data Patterns:** Performs wide pattern tests (dual stream, single/dual stream random, and sequential). -- **Memory Error Detection:** Looks for correctable/uncorrectable errors under load—useful for catching early DIMM or silicon issues. +- **HBM Bandwidth:** Measures and stresses memory read/write throughput. +- **HBM Data Patterns:** Performs wide pattern tests (dual stream, single/dual stream random, and sequential). +- **Memory Error Detection:** Looks for correctable/uncorrectable errors under load—useful for catching early DIMM or silicon issues. Extended information @@ -1140,12 +1140,12 @@ Each run generates detailed logs and a summary JSON file (typically named result "tests": [ {"name": "pcie_link_status", "result": "PASS"}, {"name": "hbm_bw", "result": "PASS"}, -... + // ... additional test entries ... ] } ``` -If any result entry shows **FAIL**, that test did not pass. +If any result entry shows **FAIL**, that test did not pass. #### Return Code diff --git a/docs/conf.py b/docs/conf.py index fc5dd7e..faf1c83 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -37,4 +37,6 @@ # Table of contents external_toc_path = "./sphinx/_toc.yml" -exclude_patterns = ['.venv'] \ No newline at end of file +exclude_patterns = ['.venv'] +# Add anchors to headings up to level 4 +myst_heading_anchors = 4 \ No newline at end of file diff --git a/docs/reference/related-documentation.md b/docs/reference/related-documentation.md index 8603a7f..65bee57 100644 --- a/docs/reference/related-documentation.md +++ b/docs/reference/related-documentation.md @@ -23,7 +23,7 @@ setup the system and run the tests in this guide. - [RVS user guide](https://github.com/ROCm/ROCmValidationSuite/blob/master/docs/ug1main.md) - [RVS modules](https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/latest/conceptual/rvs-modules.html) - [TransferBench repository](https://github.com/ROCm/TransferBench) -- [TransferBench how to guide](transferbench:how%20/use-transferbench) +- [TransferBench how to guide](https://rocm.docs.amd.com/projects/TransferBench/en/latest/how%20to/use-transferbench.html) - [TransferBench example configuration](https://github.com/ROCm/TransferBench/blob/develop/examples/example.cfg) - [RCCL repository](https://github.com/ROCm/rccl) - [RCCL Tests repository](https://github.com/ROCm/rccl-tests/tree/develop) From 1a983faa029bf28b29a7d4e96bc40aca8b16736f Mon Sep 17 00:00:00 2001 From: Michael Benavidez Date: Tue, 2 Jun 2026 16:48:29 -0500 Subject: [PATCH 2/2] Add AI assistant tool folders to ignore --- .gitignore | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 5355578..070cb9b 100644 --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,15 @@ .venv .vscode -.cline_storage + +# AI assistant tool directories (personal, not project source) +.claude/ +.cline/ +.cline_storage/ +.codex/ +.cursor/ + +# Skills are a local tool dependency, not project source +skills/ # documentation artifacts _build/