[AMDGPU] Detect BUFFER_STORE source-vgpr WAR hazard on gfx940+ regard…#2446
[AMDGPU] Detect BUFFER_STORE source-vgpr WAR hazard on gfx940+ regard…#2446panditsa wants to merge 2 commits into
Conversation
…less of soffset createsVALUHazard previously gated the MUBUF/MTBUF source-vgpr WAR hazard to fire only when SOFFSET was a literal or absent. On gfx940-family subtargets (gfx942, gfx950) this is too narrow: the hazard fires equally when SOFFSET is sourced from an SGPR. Concretely, on gfx950 a sequence of the form buffer_store_dwordx4 v[X:X+3], voff, descr, sN offen v_pk_mul_f32 v[X:X+1], <src>, <src> # next VALU cycle deterministically commits the post-pk_mul value of v[X+1] to memory for the second dword of the dwordx4 store; the other three dwords store correctly. checkVALUHazardsHelper already returns the right 2-wait-state cure for gfx940 family, so widening createsVALUHazard's trigger is sufficient. Empirically reproduced on AMD Instinct MI350X (gfx950) by Triton's fused-attention backward kernel. Adding a single S_NOP 1 (or any 2-cycle bubble) between the store and the v_pk_mul makes the corruption go away; literal-soffset stores were already covered by the pre-existing rule which is why this hazard had not shown up before. The new MIR test covers both literal and SGPR soffset on gfx900 (older), gfx942 and gfx950, plus negative cases (non-overlapping write, dwordx2 store). Assisted-by: Cursor <cursoragent@cursor.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
|
The patch seems reasonable and should be submitted upstream at https://github.com/llvm/llvm-project/pulls However...
Is this confirmed by the hardware team? Is it documented anywhere? I'm looking at the GFX9 SPG section 3.1.2 "Manually Inserted Wait States (NOPs)", row 8A in the table, and it clearly says:
with no exceptions. |
Hi @jayfoad I didn't see anything in Shader's Programming Guide but I have a sample asm test that recreates this bug. Do you know whom I can connect to validate this hazard? |
…less of soffset
createsVALUHazard previously gated the MUBUF/MTBUF source-vgpr WAR hazard to fire only when SOFFSET was a literal or absent. On gfx940-family subtargets (gfx942, gfx950) this is too narrow: the hazard fires equally when SOFFSET is sourced from an SGPR.
Concretely, on gfx950 a sequence of the form
buffer_store_dwordx4 v[X:X+3], voff, descr, sN offen
v_pk_mul_f32 v[X:X+1], , # next VALU cycle
deterministically commits the post-pk_mul value of v[X+1] to memory for the second dword of the dwordx4 store; the other three dwords store correctly. checkVALUHazardsHelper already returns the right 2-wait-state cure for gfx940 family, so widening createsVALUHazard's trigger is sufficient.
Empirically reproduced on AMD Instinct MI350X (gfx950) by Triton's fused-attention backward kernel. Adding a single S_NOP 1 (or any 2-cycle bubble) between the store and the v_pk_mul makes the corruption go away; literal-soffset stores were already covered by the pre-existing rule which is why this hazard had not shown up before. The new MIR test covers both literal and SGPR soffset on gfx900 (older), gfx942 and gfx950, plus negative cases (non-overlapping write, dwordx2 store).
Assisted-by: Cursor cursoragent@cursor.com