Skip to content

[DRAFT] Shared/aggregate load#804

Draft
alefimov-amd wants to merge 51 commits into
shared/triton-gfx950-launchfrom
shared/aggregate_load
Draft

[DRAFT] Shared/aggregate load#804
alefimov-amd wants to merge 51 commits into
shared/triton-gfx950-launchfrom
shared/aggregate_load

Conversation

@alefimov-amd
Copy link
Copy Markdown

Introduces AggregateLoad pass, which aggregates multiple small loads inside a loop in one wide load and moves it it outeerloop:

for i in range(0, 32):
  val = global_load: tensor<8x8>

transforms to:

for i in range(0,2):
  wide_load = global_load: tensor<8x128>
  smem = local_alloc(val)
  for i in range(0, 16):
    view = subview(smem)
    val = local_load(view): tensor<8x8>

In case all iterations could be aggregated, outer loop is not created.

For now application of this pass is limited to scale parameters of scaled_dot operation.

@alefimov-amd alefimov-amd changed the title Shared/aggregate load [DRAFT] Shared/aggregate load May 21, 2025
@alefimov-amd alefimov-amd marked this pull request as draft May 21, 2025 15:03
@alefimov-amd alefimov-amd force-pushed the shared/aggregate_load branch from 347992c to 6c36ed1 Compare May 22, 2025 20:13
@antiagainst antiagainst force-pushed the shared/triton-gfx950-launch branch from 77c00fa to a259f0a Compare May 26, 2025 17:58
@alefimov-amd alefimov-amd force-pushed the shared/aggregate_load branch from c65837e to 5e47bef Compare May 29, 2025 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants