feat(gpt): add output postprocess hook by Glitchfix · Pull Request #4686 · NVIDIA/Megatron-LM

Glitchfix · 2026-05-07T22:53:39Z

What does this PR do ?

Adds an objective-neutral GPT output postprocess hook so downstream RL callers can consume decoder hidden states and output-layer helpers without monkey-patching GPTModel.forward or the schedule-plan postprocess path.

Issue tracking

Linked issue: Related to #4590

Summary

Adds keyword-only output_processor and output_processor_context arguments to GPTModel.forward.
Runs the custom processor after decoder/MTP hidden-state preparation and before the default logits/loss path.
Threads the same processor/context through build_schedule_plan and the 1F1B PostProcessNode.
Adds unit coverage for a selected-token logprob processor that uses the output layer and matches the default logits path, plus schedule-plan postprocess threading.

This is intentionally a narrow first slice for #4590. It does not add RL-specific arguments to GPTModel.forward, and it leaves MTP-specific custom postprocess behavior as a follow-up unless maintainers prefer a broader API.

Contribution process

Pre-checks

I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Testing

pre-commit run --files megatron/core/models/common/model_chunk_schedule_plan.py megatron/core/models/gpt/fine_grained_callables.py megatron/core/models/gpt/gpt_model.py tests/unit_tests/models/test_gpt_model.py
- Black, pylint, and isort passed.
python -m py_compile megatron/core/models/gpt/gpt_model.py megatron/core/models/common/model_chunk_schedule_plan.py megatron/core/models/gpt/fine_grained_callables.py tests/unit_tests/models/test_gpt_model.py
PATH=/tmp/megatron-autoformat-stable/bin:$PATH CHECK_ONLY=true SKIP_DOCS=true ./tools/autoformat.sh
- Black, isort, pylint, and ruff passed.
- Mypy is advisory in tools/autoformat.sh (|| true); running it in this local environment reports existing project typing issues in the touched files' surrounding code.
Standalone CUDA smoke test using the local GPT layer spec:
- direct GPTModel.forward(..., output_processor=...) selected-token logprobs match logprobs computed from the default logits path;
- gradients flow through the custom processor to output_layer.weight;
- schedule-plan PostProcessNode invokes the threaded processor/context.
Simple GPT training run using the hook as the objective boundary:
- 2-layer local GPT, batch size 4, sequence length 8, selected-token cross entropy computed entirely inside output_processor;
- verified the hook loss matches loss computed from the default logits path before training;
- ran 6 AdamW optimizer steps and confirmed the selected-token loss decreased: 4.2114 -> 2.4986.
Tiny Hugging Face text smoke run using Salesforce/wikitext, wikitext-2-raw-v1, train[:64]:
- GPT-2 tokenizer, 2-layer local Megatron GPT, batch size 4, sequence length 16;
- selected-token cross entropy computed entirely inside output_processor;
- verified hook loss exactly matched loss computed from the default logits path before training: 10.833719 vs 10.833719;
- ran 10 AdamW optimizer steps and confirmed selected-token loss decreased: 10.8337 -> 10.3881.

Full pytest collection was not run locally because this environment is missing omegaconf and transformer_engine.

copy-pr-bot · 2026-05-07T22:53:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Add a keyword-only output_processor and context to GPTModel.forward so downstream RL callers can consume decoder hidden states and output-layer helpers without monkey-patching GPTModel postprocessing. Thread the same hook through schedule-plan postprocess nodes so 1F1B schedule users can rely on the same extension point, and add coverage for direct forward invocation plus schedule-plan threading. Addresses NVIDIA#4590. Signed-off-by: Shivanjan Chakravorty <shivanjanc@nvidia.com>

jon-barker

lgtm

Glitchfix force-pushed the feat/4590-gpt-postprocess-hook branch from 1eda379 to ab3bdb5 Compare May 7, 2026 23:04

Glitchfix force-pushed the feat/4590-gpt-postprocess-hook branch from ab3bdb5 to bbb5e60 Compare May 8, 2026 21:54

Glitchfix marked this pull request as ready for review May 8, 2026 22:05

Glitchfix requested review from a team as code owners May 8, 2026 22:05

svcnvidia-nemo-ci requested a review from a team May 8, 2026 22:06

santhnm2 approved these changes May 9, 2026

View reviewed changes

svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label May 9, 2026

fanshiqing approved these changes May 9, 2026

View reviewed changes

jon-barker approved these changes May 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gpt): add output postprocess hook#4686

feat(gpt): add output postprocess hook#4686
Glitchfix wants to merge 1 commit intoNVIDIA:mainfrom
Glitchfix:feat/4590-gpt-postprocess-hook

Glitchfix commented May 7, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

jon-barker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Glitchfix commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issue tracking

Summary

Contribution process

Pre-checks

Testing

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

jon-barker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Glitchfix commented May 7, 2026 •

edited

Loading