Skip to content

Refactoy the pack scheduling for scheduleIterAlg = 3.#1358

Open
vin-huang wants to merge 2 commits into
ROCm:developfrom
vin-huang:pop_pack_in_rr
Open

Refactoy the pack scheduling for scheduleIterAlg = 3.#1358
vin-huang wants to merge 2 commits into
ROCm:developfrom
vin-huang:pop_pack_in_rr

Conversation

@vin-huang
Copy link
Copy Markdown
Collaborator

@vin-huang vin-huang commented Nov 18, 2024

There are three tracks for the sparse MM.
so, needed to make sure the pack items that are required will be poped in each iter.

[idea]

  • Used 3 different pack pools to store the pack instructions of A, B, and Metadata
  • Step 1, only put the required pack into the code (the number of required packs may differ for each mfma iteration).
    check if the inserted pack is forfulled the instPerPack, if not insert next pack instructions until statisfied.
    Step 2, if there still have room before mfma, then insert next pack instructions (the # of instruction that going to be inserted will same as #instPerPack)
    Step 3, put another pack or SNop before the mfma instruction according to the needed latency. the combination of insertion may be 2 packs, 1 pack + snop 0, or snop 1.

@vin-huang vin-huang added the gfx94x Run CI on gfx94x label Nov 18, 2024
@vin-huang vin-huang changed the title [Sparse] Pop the pack items by round robin to make sure the remain items will be pop in the same round of mfma. Refactoy the pack scheduling for scheduleIterAlg = 3. Nov 21, 2024
@vin-huang vin-huang force-pushed the pop_pack_in_rr branch 3 times, most recently from 2036a35 to 76ab700 Compare November 25, 2024 17:28
 * Used 3 different pack pools to store the pack instructions of A, B, and Metadata
 * Step 1, only put the required pack into the code (the number of required packs may differ for each mfma iteration).
           check if the inserted pack is forfulled the instPerPack, if not insert next pack instructions until statisfied.
   Step 2, if there still have room before mfma, then insert next pack instructions (#instPerPack)
   Step 3, put another pack or SNop before the mfma instruction according to the needed latency. the combination of insertion may be 2 packs, 1 pack + snop 0, or snop 1.
@eidenyoshida eidenyoshida added the noCI Disable testing on supported CI systems: math libraries CI has this feature enabled.. label May 15, 2025
@jayhawk-commits
Copy link
Copy Markdown
Contributor

Please resolve merge conflicts or close this PR to complete the task of importing PRs from this repo to the monorepo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gfx94x Run CI on gfx94x noCI Disable testing on supported CI systems: math libraries CI has this feature enabled..

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants