Skip to content

LWP100-1465 generalize common GEMM graph#5

Draft
rocm-devops wants to merge 10 commits into
developfrom
yoonchoi/gemm-common-graph
Draft

LWP100-1465 generalize common GEMM graph#5
rocm-devops wants to merge 10 commits into
developfrom
yoonchoi/gemm-common-graph

Conversation

@rocm-devops
Copy link
Copy Markdown

Original author: @yoonchoi

PR overview

For LWP100-1465

Common GEMM graph is used for Address calculation test.
Changes in common GEMM graph from #1144 are extracted and subsequent changes are added.

Testing

Many existing tests using GEMM common graphs.
Tested if the existing tests passes with the minimal changes to match jamming factors.

Commit message

Refactor and generalize common GEMM graph

  • Removed templatized datatype from common GEMM graph
  • datatypes are now per matrix and member variables.
  • Workgroupsizes are flattened similarly to client code.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @a1-mlselibci-npi

Generated Documentation

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @a1-mlselibci-npi

CodeQL report

Results Summary

recommendation
6
Full table of results
Type Description File Path Start Line Start Column End Line End Column
recommendation Variant access from getElement; use graph.get<> instead. /lib/source/CodeGen/LoadStoreTileGenerator.cpp 334 30 334.1 48
recommendation Variant access from getElement; use graph.get<> instead. /lib/source/CodeGen/LoadStoreTileGenerator.cpp 338 30 338 48
recommendation Variant access from getElement; use graph.get<> instead. /test/unit/KernelGraphTest.cpp 1052 17 1052 35
recommendation Variant access from getElement; use graph.get<> instead. /test/unit/KernelGraphTest.cpp 1060 17 1060 35
recommendation Variant access from getElement; use graph.get<> instead. /test/unit/KernelGraphTest.cpp 984 17 984 35
recommendation Variant access from getElement; use graph.get<> instead. /test/unit/KernelGraphTest.cpp 992 17 992 35

Links

  • HTML
  • Sarif (for download and usage in conjunction with SARIF viewers)

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @a1-mlselibci-npi

Code Coverage Report for gfx942

Summary

Type Total Missed Master Missed Missed Change Coverage Master Coverage Coverage Change
Lines 43480 5967 5422 545 86.28% 87.34% -1.06%
Functions 4340 658 636 22 84.84% 85.32% -.48%
Regions 23265 3719 5536 -1817 84.01% 78.74% 5.27%
Branches 13430 3392 3477 -85 74.74% 74.46% .28%

This PR adds/edits 12743 newly uncovered lines.

Artifacts

Commit Hashes

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @yoonchoi

@maemmett, @psandhu, I changed the codes in GEMM common graph and restored original checks. Only changed TileSizes of Y to make jamming factors maintained.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @yoonchoi

Not sure why following CI errors occurred. Build parameters for gfx950 were given when I when to Jenkins page at http://math-ci.amd.com/job/enterprise/job/precheckin/job/rocRoller/job/PR-1157/build?delay=0sec

3133/8422 Test #3135: NewArithmeticTests/IntegralArithmeticTest.GPU_Int32/(gfx950,VGPR) ............................................................................................................***Failed    1.28 sec
Running main() from ./build/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = NewArithmeticTests/IntegralArithmeticTest.GPU_Int32/22
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from NewArithmeticTests/IntegralArithmeticTest
[ RUN      ] NewArithmeticTests/IntegralArithmeticTest.GPU_Int32/22
clang++: error: invalid target ID 'gfx950'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')
unknown file: Failure
C++ exception with description "lib/source/Assemblers/SubprocessAssembler.cpp:85: FatalError(retCode == 0)
	command = /opt/rocm/llvm/bin/clang++ -x assembler -target amdgcn-amd-amdhsa -mcode-object-version=5 -mcpu=gfx950 -mwavefrontsize64 -c -o /tmp/rocroller-1z71q5/kernel.o /tmp/rocroller-1z71q5/kernel.s
	retCode = 256
	output = 
" thrown in the test body.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @yoonchoi

Original review comment on CMakeLists.txt at line 1

Remove changes in this file, which will be separately merged by Lauren's PR.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @yoonchoi

Original review comment on cmake/Dependencies.cmake at line 1

Remove changes in this file, which will be separately merged by Lauren's PR.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @yoonchoi

Original review comment on test/catch/KernelGraphRemoveDuplicatesTest.cpp at line 47

To match the original jamming factors. WorkgroupsizeY is half of the original code.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @yoonchoi

Original review comment on test/unit/KernelGraphTest.cpp at line 1258

To match the original jamming factors. WorkgroupsizeY is half of the original code.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @yoonchoi

Original review comment on test/common/CommonGraphs.cpp at line 1

This is refactoring from CommonGraphs_impl.hpp.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @nhenders

Original review comment on test/common/CommonGraphs.cpp at line 313

We should generalize this past MFMA. Maybe setMITileSize. (MI for Matrix Instruction)

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @nhenders

Original review comment on test/common/CommonGraphs.cpp at line 368

We also need to parameterize the wavefrontSize (e.g GEMM::setWavefrontSize); gfx12's default is 32 but the underlying GEMMProblem defaults to 64.

We could also have an API for GEMM::setWavefrontSize that takes in a GPUArchitecture and queries the GPUCapability::DefaultWavefrontSize.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @nhenders

Original review comment on test/common/common/CommonGraphs.hpp at line 219

GEMMProblem has switches for controlling optimizations. Some of these are on by default e.g. fuseLoops, tailLoops, LDS loads. I think we should consider switching all these off by default, and have enablement be explicit.

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @yoonchoi

Original review comment on test/common/common/CommonGraphs.hpp at line 212

These became member variables to allow the capability to reassign new argument values. (Used in PR #1144).

@rocm-devops
Copy link
Copy Markdown
Author

Original commenter: @jolabega

Original review comment on test/common/CommonGraphs.cpp at line 313

Yes, that makes more sense as we try to move away for explicitly refer to MFMA where code should be generic.

@ROCm ROCm deleted a comment from rocm-devops Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants