fix issue https://nvbugspro.nvidia.com/bug/6098588#3696
Conversation
Review SummaryThis PR simplifies the transpose_on_export logic in _accumulate_grouped_export by removing the adaptive shape-check and always transposing when the flag is set. It adds transpose_on_export = True as a class attribute on GPTOSSMLPDownProjMapping so the down-proj expert weights are always transposed on export. Potential bug -- bias transpose regression: GPTOSSMLPDownProjMapping is reused for both weight and bias expert mappings (lines 173-179 and 190-196 in gpt_oss_bridge.py). The class-level transpose_on_export = True means stacked bias tensors (shape [num_experts, hidden]) will also be transposed to [hidden, num_experts], which is incorrect. The removed adaptive logic in model_bridge.py implicitly protected against this. See inline comments for a suggested fix (guard on merged.ndim >= 3). PR metadata: The title references an internal bug tracker URL. Consider [ckpt] fix: description format. Docstring nit: The docstring says For square down_proj -- the transpose is needed regardless of whether the matrix is square. Missing test coverage: No unit tests for the GPT-OSS MoE export (megatron-to-HF) path. Suggested test cases: No perf tests impacted. |
99b9808 to
a235112
Compare
Signed-off-by: weijiac <weijiac@NVIDIA.com>
a235112 to
f8b6310
Compare
What does this PR do ?
fix issue https://nvbugspro.nvidia.com/bug/6098588
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information