[performance] feat: add 405B B200/B300 V2 aliases + 405B GB200 NVFP4 256x gpu scale expendable segments addition due to CUDA OOM issue by rsalagame-nvidia · Pull Request #3759 · NVIDIA-NeMo/Megatron-Bridge

rsalagame-nvidia · 2026-05-08T23:17:10Z

Summary

Bring 405B B200/B300 V2 pretrain configs to parity with GB200/GB300/H100/VR200 on main. Currently:

LLAMA31_405B_*_GB200/GB300/H100_*_V2 aliases exist in llama31_workload_base_configs.py and are re-exported.
LLAMA31_405B_*_B200/B300_*_V2 aliases don't exist at all on main, even though the runtime lookup getattr(configs.llama, "LLAMA31_405B_PRETRAIN_CONFIG_B200_FP8_CS_V2") is performed for B200/B300 hosts. The lookup returns None, silently falls back to V1, and produces NaN gradients on B200/B300 405B runs.

(Note: LLAMA3_70B_*_B200/B300_*_V2 is fine on main — full V1/V2 coverage. The gap is purely at 405B.)

Changes

This is a 2-file change on main, mirroring how VR200 V2 is already done in this file and matching what llmb-r0.4.0 already has:

scripts/performance/configs/llama/llama31_workload_base_configs.py: define 8 V2 aliases (B*_V2 = GB*_V2) after the existing VR200 V2 block, and add 8 matching strings to that file's __all__.
scripts/performance/configs/llama/__init__.py: import the 8 V2 names interleaved alphabetically with the V1 entries, and add 8 matching strings to __all__ between GB300_NVFP4_V2 and H100_BF16_V2.

No changes to llama3_workload_base_configs.py, lookup logic in utils/utils.py, or any 70B/8B entries.

Why "B_V2 = GB_V2" aliases (not independent definitions)?

Same pattern already used for VR200 V2 in this file. B200/B300 share their tuning targets with GB200/GB300 V2 at this scale (num_gpus=256, GBS=1536); a thin alias keeps the config surface in lock-step. This also matches what landed on llmb-r0.4.0 for the same gap.

Test plan

Both files byte-compile cleanly (python -m py_compile)
AST scan confirms all 8 expected V2 names are defined and listed in __all__ in llama31_workload_base_configs.py, and imported and listed in __all__ in __init__.py
On a CUDA host: python -c "from configs.llama import LLAMA31_405B_PRETRAIN_CONFIG_B200_FP8_CS_V2; print(LLAMA31_405B_PRETRAIN_CONFIG_B200_FP8_CS_V2)" from scripts/performance/ prints a WorkloadBaseConfig (the GB200 V2 alias) instead of ImportError
405B B200 FP8_CS V2 pretrain run no longer silently falls back to V1 / produces NaN gradients

… configs.llama Signed-off-by: Rahul Salagame <rsalagame@nvidia.com>

copy-pr-bot · 2026-05-08T23:17:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…raints on GB200. Performance remains same. Signed-off-by: Rahul Salagame <rsalagame@nvidia.com>

claude · 2026-05-08T23:20:48Z

LGTM - clean minimal change adding 8 V2 aliases for B200/B300 following the existing VR200 pattern. No bugs or issues found. No perf tests impacted. Consider extending test_llama31_405b_perf_config_instantiation to cover B200 and B300 with config_variant v2.

claude · 2026-05-08T23:20:55Z

Detailed notes: The 8 new aliases (B200_V2 = GB200_V2, B300_V2 = GB300_V2) mirror the VR200 V2 pattern on lines 291-293 of llama31_workload_base_configs.py. Both all lists and init.py imports are consistent. The existing llama31_405b_pretrain_config_b200/b300 functions already accept config_variant=v2 via get_workload_base_config getattr lookup, so this fix ensures that lookup returns the correct config instead of None (which caused silent V1 fallback and NaN gradients). Suggested test cases: No perf tests impacted since only aliases were added. However test_llama31_405b_perf_config_instantiation only covers H100 today - extending it to B200/B300 V2 would catch this class of regression.

ko3n1g

@malay-nagda these are not tracked configs? Will we need to adjust our internal CI to make sure it continues running v1?

malay-nagda · 2026-05-09T10:46:11Z

@malay-nagda these are not tracked configs? Will we need to adjust our internal CI to make sure it continues running v1?

We do not run anything, v1 or v2 for 405B for both B200 and B300 in CI. So, no need to change in anything in CI.

Signed-off-by: Rahul Salagame <rsalagame@nvidia.com>

ko3n1g · 2026-05-11T08:52:59Z

/ok to test 1aec6cd

[performance] feat: add 405B B200/B300 V2 aliases + re-export them in…

675ce17

… configs.llama Signed-off-by: Rahul Salagame <rsalagame@nvidia.com>

rsalagame-nvidia requested review from bdubauski, ko3n1g, malay-nagda and sudostock May 8, 2026 23:17

rsalagame-nvidia self-assigned this May 8, 2026

gb200 405b nvfp4 256x expendable segments: True resolves memory const…

28cd970

…raints on GB200. Performance remains same. Signed-off-by: Rahul Salagame <rsalagame@nvidia.com>

rsalagame-nvidia changed the title ~~[performance] feat: add 405B B200/B300 V2 aliases + re-export them in…~~ [performance] feat: add 405B B200/B300 V2 aliases + 405B GB200 NVFP4 256x gpu scale expendable segments addition due to CUDA OOM issue May 8, 2026

ko3n1g reviewed May 9, 2026

View reviewed changes

Comment thread scripts/performance/perf_plugins.py Outdated

resolved extra quote nvfp4

1aec6cd

Signed-off-by: Rahul Salagame <rsalagame@nvidia.com>

ko3n1g approved these changes May 11, 2026

View reviewed changes

ko3n1g added the docs-only With great power comes great responsibility. label May 11, 2026

ko3n1g merged commit 2461340 into main May 11, 2026
38 checks passed

ko3n1g deleted the rsalagame_b300_b200_init_gb200_405b_nvfp4_256gpus branch May 11, 2026 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[performance] feat: add 405B B200/B300 V2 aliases + 405B GB200 NVFP4 256x gpu scale expendable segments addition due to CUDA OOM issue #3759

[performance] feat: add 405B B200/B300 V2 aliases + 405B GB200 NVFP4 256x gpu scale expendable segments addition due to CUDA OOM issue #3759
ko3n1g merged 3 commits intomainfrom
rsalagame_b300_b200_init_gb200_405b_nvfp4_256gpus

rsalagame-nvidia commented May 8, 2026

Uh oh!

copy-pr-bot Bot commented May 8, 2026

Uh oh!

claude Bot commented May 8, 2026

Uh oh!

claude Bot commented May 8, 2026

Uh oh!

ko3n1g left a comment

Uh oh!

Uh oh!

malay-nagda commented May 9, 2026

Uh oh!

ko3n1g commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rsalagame-nvidia commented May 8, 2026

Summary

Changes

Why "B*_V2 = GB*_V2" aliases (not independent definitions)?

Test plan

Uh oh!

copy-pr-bot Bot commented May 8, 2026

Uh oh!

claude Bot commented May 8, 2026

Uh oh!

claude Bot commented May 8, 2026

Uh oh!

ko3n1g left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

malay-nagda commented May 9, 2026

Uh oh!

ko3n1g commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Why "B_V2 = GB_V2" aliases (not independent definitions)?