Flaky: TestFusedApplyMLARope::test_forward_backward_for_q[thd] backward mismatch exceeds bf16 tolerances

### Summary

`tests/unit_tests/fusions/test_mla_yarn_rope_apply.py::TestFusedApplyMLARope::test_forward_backward_for_q[thd]` is intermittently failing in CI with a backward-pass numerical mismatch that exceeds bf16 tolerances.

### Observed failure

CI run: https://github.com/NVIDIA/Megatron-LM/actions/runs/25408640123 (job `tests/unit_tests/**/*.py - latest`, ID `74525633131`)

```
FAILED tests/unit_tests/fusions/test_mla_yarn_rope_apply.py::TestFusedApplyMLARope::test_forward_backward_for_q[thd]
E   AssertionError: Mismatch in bwd: Tensor-likes are not close!
E   Mismatched elements: 31 / 786432 (0.0%)
E   Greatest absolute difference: 3.015625 at index (104, 29, 170) (up to 0.05 allowed)
E   Greatest relative difference: 33.509803771972656 at index (104, 28, 166) (up to 0.02 allowed)
```

The assertion at `tests/unit_tests/fusions/test_mla_yarn_rope_apply.py:111` compares the backward gradient of the reference `apply_rotary_pos_emb` against the fused `fused_apply_mla_rope_for_q` in bf16, `thd` packed-sequence layout, `cu_seqlens=[0, 27, 54, 99, 128]`.

### Why this is non-deterministic

- Only 31 / 786,432 elements (~0.004%) exceed tolerance.
- bf16 tolerances: `atol=5e-2`, `rtol=2e-2`.
- The same merge-queue commit passed this job on rerun in workflow [25415024224](https://github.com/NVIDIA/Megatron-LM/actions/runs/25415024224).
- The fused MLA YARN RoPE backward kernel produces small numerical drift in the `thd` (packed-sequence) path that occasionally exceeds bf16 tolerances at outlier indices.

### Owning code

- Test: `tests/unit_tests/fusions/test_mla_yarn_rope_apply.py`
- Kernel: `megatron/core/fusions/fused_mla_yarn_rope_apply.py`
- Introduced in MR `!2949` (perf(mla, experimental): MLA RoPE fusion and YARN embedding cache); removed experimental tag in #2233.

### Mitigation

Marked `flaky_in_dev` in #4639 as a stop-gap. The underlying numerical drift in the fused backward kernel for `thd` should be investigated and tightened so the test can be re-enabled.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky: TestFusedApplyMLARope::test_forward_backward_for_q[thd] backward mismatch exceeds bf16 tolerances #4640

Summary

Observed failure

Why this is non-deterministic

Owning code

Mitigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flaky: TestFusedApplyMLARope::test_forward_backward_for_q[thd] backward mismatch exceeds bf16 tolerances #4640

Description

Summary

Observed failure

Why this is non-deterministic

Owning code

Mitigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions