chore: nightly sync main into dev (06_05_2026) by svcnvidia-nemo-ci · Pull Request #4659 · NVIDIA/Megatron-LM

svcnvidia-nemo-ci · 2026-05-06T21:33:36Z

Summary

Nightly sync of main into dev.

102 commits merged from main
Python lines: +33575 / -8262 across 253 files
Merge strategy: git merge origin/main -X theirs --no-edit, with manual reconciliation for known conflicts.

Files taken from main

megatron/core/optimizer/layer_wise_optimizer.py (no-op; identical between main and dev currently)

Files kept on dev (overriding the skill's default of taking main's version)

The skill recommends taking main's version of these files for known semantic conflicts. In this sync the situation is reversed — dev's versions are the more current ones. Main's versions reference args.hybrid_context_parallel, but dev renamed that flag to args.dynamic_context_parallel (commit cde56a4 "Fix for rope when enabling THD + Dynamic-CP; use the naming Dynamic-CP"). Taking main's versions would cascade into AttributeError at runtime.

megatron/training/training.py
megatron/training/utils.py
megatron/training/initialize.py
megatron/training/datasets/data_samplers.py

Files deleted in main, accepted as deletion

These were legacy GPT loaders removed in main #4322 ("remove legacy GPT code"). Nothing in the merged tree references them.

tools/checkpoint/loader_legacy.py
tools/checkpoint/loader_llama_mistral.py

Files deleted in dev, NOT restored

megatron/core/pipeline_parallel/hybrid_cp_schedule.py was intentionally removed in dev (commit cde56a4) as part of the dynamic-CP refactor. Not restored, since the merged tree uses dev's wrap_data_iterator mechanism — no caller imports BalancedCPScheduler or HybridCPDataLoaderWrapper.

Dependency triple kept on dev

Per the skill's hard rule: pyproject.toml, uv.lock, docker/Dockerfile.ci.dev were restored from origin/dev. Dev's nvidia-resiliency-ext pinned revision (15a8515) was verified to contain all APIs the merged tree imports (get_write_results_queue, CheckpointMetadataCache, CachedMetadataFileSystemReader, etc.). No git-source reconciliation required.

API mismatch detection

After taking main's version of files (then later reverting), audited:

multi_latent_attention.py calls off_interface.group_offload() and off_interface.group_commit() — both exist on dev's FineGrainedActivationOffloadingInterface
gpt_model.py and hybrid_model.py call init_chunk_handler(6 kwargs) — matches dev's signature
_resolve_cu_seqlens exists on dev's GatedDeltaNet
_is_distopt_quantized_param exists on dev's DistributedOptimizer
CudaGraphScope exists in dev's enums.py

No active mismatches remain.

Linting

black --config pyproject.toml (24.10.0): no diff
isort (5.13.2): no diff
pylint on changed megatron/core/ files (84 files): 10.00/10

Remerge diff

Remerge diff stat (file-level summary)

Date:   Wed May 6 21:32:39 2026 +0000

    chore: nightly sync main into dev (06_05_2026)

 .github/workflows/cicd-main.yml                    |    5 -
 docker/Dockerfile.ci.dev                           |    4 -
 docs/conf.py                                       |   18 +-
 .../detxoify_lm/generate_samples_gpt.py            |   76 +-
 .../gpt/gpt_dynamic_inference_with_coordinator.py  |    6 +-
 examples/mimo/train.py                             |    6 +-
 examples/multimodal/layer_specs.py                 |    2 +-
 examples/multimodal/model.py                       |   85 +-
 examples/post_training/modelopt/convert_model.py   |   19 +-
 examples/post_training/modelopt/export.py          |    5 +-
 examples/post_training/modelopt/finetune.py        |   67 +-
 examples/post_training/modelopt/generate.py        |   27 +-
 examples/post_training/modelopt/mmlu.py            |   45 +-
 .../modelopt/offline_feature_extract.py            |   56 +-
 examples/post_training/modelopt/prune.py           |   13 +-
 examples/post_training/modelopt/quantize.py        |   55 +-
 examples/post_training/modelopt/validate.py        |   32 +-
 gpt_builders.py                                    |   77 +-
 hybrid_builders.py                                 |    4 +-
 megatron/core/datasets/readme.md                   |   64 --
 megatron/core/transformer/mlp.py                   |    4 -
 megatron/core/transformer/moe/fused_a2a.py         |   13 -
 megatron/core/transformer/moe/moe_layer.py         |    8 -
 megatron/core/transformer/moe/token_dispatcher.py  |    4 -
 megatron/core/transformer/transformer_config.py    |   27 -
 megatron/core/transformer/transformer_layer.py     |   13 -
 megatron/elastification/arguments.py               |    6 +-
 megatron/elastification/flextron_utils.py          |   11 +-
 megatron/elastification/pretrain_hybrid_flex.py    |  136 ++-
 .../elastification/router/hybrid_flex_router.py    |    7 +-
 megatron/legacy/model/__init__.py                  |    5 -
 megatron/post_training/arguments.py                |    7 +-
 megatron/post_training/model_builder.py            |   55 +-
 megatron/training/activation_logging.py            |   37 +-
 megatron/training/argument_utils.py                |   90 +-
 megatron/training/arguments.py                     |  589 +----------
 megatron/training/async_utils.py                   |    4 +-
 megatron/training/checkpointing.py                 |   33 +-
 megatron/training/config/__init__.py               |   27 +-
 megatron/training/config/container.py              |   40 +-
 megatron/training/config/instantiate_utils.py      |   46 +-
 megatron/training/config/training_config.py        |   24 +-
 megatron/training/config/utils.py                  |   13 +-
 megatron/training/config/yaml_utils.py             |   10 +-
 megatron/training/datasets/data_samplers.py        |   51 +-
 megatron/training/training.py                      |  261 +----
 megatron/training/utils.py                         |    9 -
 model_provider.py                                  |   12 +-
 pretrain_bert.py                                   |   32 +-
 pretrain_gpt.py                                    |   42 +-
 pretrain_hybrid.py                                 |   65 +-
 pretrain_mamba.py                                  |  363 -------
 pretrain_t5.py                                     |    2 +-
 pretrain_vlm.py                                    |   10 +-
 pyproject.toml                                     |   19 +-
 .../unit_tests/fusions/test_mla_yarn_rope_apply.py |   10 -
 tests/unit_tests/models/test_hybrid_moe_model.py   |   16 -
 tools/checkpoint/checkpoint_inspector.py           |    9 +-
 tools/checkpoint/convert.py                        |   62 +-
 tools/checkpoint/dist_checkpoint_io.py             |   45 +-
 tools/checkpoint/gpt_hybrid_conversion.py          |  171 +--
 tools/checkpoint/loader_legacy.py                  |  416 --------
 tools/checkpoint/loader_llama_mistral.py           |  751 -------------
 tools/checkpoint/loader_mixtral_hf.py              |   12 +-
 tools/checkpoint/remap_gpt_dsa_to_mamba.py         |    5 -
 tools/prepare_cache.py                             |    9 +-
 tools/preprocess_data.py                           |  217 ++--
 tools/preprocess_mmdata.py                         |  160 ++-
 train_rl.py                                        |   20 +-
 uv.lock                                            | 1114 +++-----------------
 70 files changed, 1258 insertions(+), 4500 deletions(-)

Full diff omitted to keep the PR body compact (~10k lines). Reviewers can run git show --remerge-diff 431ac5df05104bc1d5015f5ac1842285d1c5e6ee locally or browse the merge commit on GitHub.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Xin Yao <xiny@nvidia.com>

Co-authored-by: john2 <john2@jrlogin01.jureca>

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Co-authored-by: root <root@nvl72098-T17.cm.cluster> Co-authored-by: William Dykas <wdykas@oci-hsg-cs-001-vscode-03.cm.cluster> Co-authored-by: root <root@nvl72160-T13.cm.cluster>

…classmethod (#3812) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

Signed-off-by: oliver könig <okoenig@nvidia.com>

#4403) Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>

Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>

Signed-off-by: Maanu Grover <maanug@nvidia.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Signed-off-by: dimapihtar <dpykhtar@nvidia.com>

Co-authored-by: Siddharth Singh <sidsingh@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nference cuda graph scope for hybrid models (#4440)

…ss curve gaps for latent MoE models (#4433) Signed-off-by: root <jiemingz@nvidia.com>

…4158) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…4422) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: rprenger <rprenger@nvidia.com>

Signed-off-by: qiyuw <qiyuw@nvidia.com> Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

svcnvidia-nemo-ci · 2026-05-06T23:38:48Z

/ok to test 3f10d85

svcnvidia-nemo-ci · 2026-05-07T01:21:56Z

/ok to test b83a102

svcnvidia-nemo-ci · 2026-05-07T03:05:10Z

/ok to test 46ee761

svcnvidia-nemo-ci · 2026-05-07T21:18:46Z

Superseded by today's nightly sync.

# Conflicts: # megatron/core/distributed/param_and_grad_buffer.py

Phlip79 · 2026-05-08T05:41:32Z

/ok to test 676f3fa

…but in dev Signed-off-by: Deyu Fu <deyuf@nvidia.com>

FDecaYed · 2026-05-08T08:09:03Z

/ok to test 0cb4ec3

Phlip79 · 2026-05-08T16:33:42Z

/ok to test 2207908

svcnvidia-nemo-ci · 2026-05-08T21:19:34Z

Superseded by today's nightly sync.

minitu and others added 30 commits April 22, 2026 18:02

Fix nvtx_decorator to check _nvtx_enabled at call time (#4184)

9a3c927

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Xin Yao <xiny@nvidia.com>

fix merges_file typo in megatron_hf_tokenizer (#4392)

60f71e1

Co-authored-by: john2 <john2@jrlogin01.jureca>

Enable NullTokenizer for pretraining to reduce I/O access (#4057)

c9dfe34

docs: Add SECURITY.md (#4431)

7073492

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Mamba inference opt (#4414)

40627d0

Co-authored-by: root <root@nvl72098-T17.cm.cluster> Co-authored-by: William Dykas <wdykas@oci-hsg-cs-001-vscode-03.cm.cluster> Co-authored-by: root <root@nvl72160-T13.cm.cluster>

DDP refactoring: Extract parameter layout computation into optimizer …

55b8111

…classmethod (#3812) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Update PR template with explicit request for issue (#4409)

90e09b6

Misc inference fixes (#4397)

ab2b33d

Rename Mamba to Hybrid outside megatron/core (#4159)

60408d5

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Include mtp layers in token per expert logging (#4412)

a52014c

fix: NVRx async compatibility and defer resiliency import (#4420)

32275b2

Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

ci: add base_sha to codecov/codecov-action upload step (#4445)

9bb35a8

Signed-off-by: oliver könig <okoenig@nvidia.com>

Update copy-pr-bot.yaml [skip ci]

3034d86

fix(checkpoint_inspector): allow empty --param-to-param-group-map-json (

f78ed05

#4403) Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>

Add the YARN support for hybrid_model (#4244)

4d6cdd5

Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>

[training migration] Add container class for config dataclasses (#4227)

41ffa83

Signed-off-by: Maanu Grover <maanug@nvidia.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Inference: Fix broken functional tests on gitlab (#4454)

a1165fa

SafeUnpickler class for safe pickle usage (#4319)

d4cacef

Signed-off-by: dimapihtar <dpykhtar@nvidia.com>

get rid of weights_only=False (#4434)

109feda

Signed-off-by: dimapihtar <dpykhtar@nvidia.com>

Inference | Per-block MoE routing storage for prefix caching (#4301)

64870c1

Co-authored-by: Siddharth Singh <sidsingh@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add troubleshooting tip for 'access forbidden' (#4449)

017e684

Fix checkpoint loading with rerun state machine (#4448)

3d7bcd3

Add misc CUDA graph sugar to CudaGraphManager (#4425)

9b02206

Inference: Add the embedding and output layer in the full_iteration_i…

35f76df

…nference cuda graph scope for hybrid models (#4440)

Important bugfixes in local CG implementation that were leading to lo…

481efd0

…ss curve gaps for latent MoE models (#4433) Signed-off-by: root <jiemingz@nvidia.com>

fix: Replace polynomial rolling hash with SHA-256 for prefix caching (#…

e9abb6c

…4158) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(ckpt): expose validate_access_integrity knob on dist-ckpt load (#…

377af02

…4422) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fix multivalidation (#3388)

241a5ca

Signed-off-by: rprenger <rprenger@nvidia.com>

Add missing knob for reduce_scatter_with_fp32_accumulation (#4410)

f2dcd42

Signed-off-by: qiyuw <qiyuw@nvidia.com> Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>

Enable CUDA graphs for MTP inference (#4260)

03f4111

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

svcnvidia-nemo-ci force-pushed the main2dev/06_05_2026 branch from ad8f471 to 3f10d85 Compare May 6, 2026 23:38

copy-pr-bot Bot temporarily deployed to test May 6, 2026 23:40 Inactive

svcnvidia-nemo-ci force-pushed the main2dev/06_05_2026 branch from 3f10d85 to b83a102 Compare May 7, 2026 01:21

copy-pr-bot Bot temporarily deployed to test May 7, 2026 01:23 Inactive

fix: post-CI corrections

46ee761

svcnvidia-nemo-ci force-pushed the main2dev/06_05_2026 branch from b83a102 to 46ee761 Compare May 7, 2026 03:05

copy-pr-bot Bot temporarily deployed to test May 7, 2026 03:07 Inactive

Phlip79 marked this pull request as ready for review May 7, 2026 07:02

Phlip79 requested review from a team as code owners May 7, 2026 07:02

svcnvidia-nemo-ci added the complexity: high label May 7, 2026

svcnvidia-nemo-ci closed this May 7, 2026

Phlip79 reopened this May 8, 2026

Merge remote-tracking branch 'origin/dev' into main2dev/06_05_2026

676f3fa

# Conflicts: # megatron/core/distributed/param_and_grad_buffer.py

copy-pr-bot Bot temporarily deployed to test May 8, 2026 05:42 Inactive

FDecaYed added 2 commits May 8, 2026 16:04

restore some missing changes post merge due to PR not merged to main …

d019432

…but in dev Signed-off-by: Deyu Fu <deyuf@nvidia.com>

Merge branch 'dev' into main2dev/06_05_2026

0cb4ec3

fix: correct misplaced colon in moe_layer.py inference guard

2207908

copy-pr-bot Bot temporarily deployed to test May 8, 2026 16:35 Inactive

svcnvidia-nemo-ci closed this May 8, 2026

Phlip79 reopened this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: nightly sync main into dev (06_05_2026)#4659

chore: nightly sync main into dev (06_05_2026)#4659
svcnvidia-nemo-ci wants to merge 108 commits intodevfrom
main2dev/06_05_2026

svcnvidia-nemo-ci commented May 6, 2026

Uh oh!

svcnvidia-nemo-ci commented May 6, 2026

Uh oh!

svcnvidia-nemo-ci commented May 7, 2026

Uh oh!

svcnvidia-nemo-ci commented May 7, 2026

Uh oh!

svcnvidia-nemo-ci commented May 7, 2026

Uh oh!

Phlip79 commented May 8, 2026

Uh oh!

FDecaYed commented May 8, 2026

Uh oh!

Phlip79 commented May 8, 2026

Uh oh!

svcnvidia-nemo-ci commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

svcnvidia-nemo-ci commented May 6, 2026

Summary

Files taken from main

Files kept on dev (overriding the skill's default of taking main's version)

Files deleted in main, accepted as deletion

Files deleted in dev, NOT restored

Dependency triple kept on dev

API mismatch detection

Linting

Remerge diff

Uh oh!

svcnvidia-nemo-ci commented May 6, 2026

Uh oh!

svcnvidia-nemo-ci commented May 7, 2026

Uh oh!

svcnvidia-nemo-ci commented May 7, 2026

Uh oh!

svcnvidia-nemo-ci commented May 7, 2026

Uh oh!

Phlip79 commented May 8, 2026

Uh oh!

FDecaYed commented May 8, 2026

Uh oh!

Phlip79 commented May 8, 2026

Uh oh!

svcnvidia-nemo-ci commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants