Skip to content

feat(gpt): add output postprocess hook#4686

Open
Glitchfix wants to merge 1 commit intoNVIDIA:mainfrom
Glitchfix:feat/4590-gpt-postprocess-hook
Open

feat(gpt): add output postprocess hook#4686
Glitchfix wants to merge 1 commit intoNVIDIA:mainfrom
Glitchfix:feat/4590-gpt-postprocess-hook

Conversation

@Glitchfix
Copy link
Copy Markdown

@Glitchfix Glitchfix commented May 7, 2026

What does this PR do ?

Adds an objective-neutral GPT output postprocess hook so downstream RL callers can consume decoder hidden states and output-layer helpers without monkey-patching GPTModel.forward or the schedule-plan postprocess path.

Issue tracking

Linked issue: Related to #4590

Summary

  • Adds keyword-only output_processor and output_processor_context arguments to GPTModel.forward.
  • Runs the custom processor after decoder/MTP hidden-state preparation and before the default logits/loss path.
  • Threads the same processor/context through build_schedule_plan and the 1F1B PostProcessNode.
  • Adds unit coverage for a selected-token logprob processor that uses the output layer and matches the default logits path, plus schedule-plan postprocess threading.

This is intentionally a narrow first slice for #4590. It does not add RL-specific arguments to GPTModel.forward, and it leaves MTP-specific custom postprocess behavior as a follow-up unless maintainers prefer a broader API.

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Testing

  • pre-commit run --files megatron/core/models/common/model_chunk_schedule_plan.py megatron/core/models/gpt/fine_grained_callables.py megatron/core/models/gpt/gpt_model.py tests/unit_tests/models/test_gpt_model.py
    • Black, pylint, and isort passed.
  • python -m py_compile megatron/core/models/gpt/gpt_model.py megatron/core/models/common/model_chunk_schedule_plan.py megatron/core/models/gpt/fine_grained_callables.py tests/unit_tests/models/test_gpt_model.py
  • PATH=/tmp/megatron-autoformat-stable/bin:$PATH CHECK_ONLY=true SKIP_DOCS=true ./tools/autoformat.sh
    • Black, isort, pylint, and ruff passed.
    • Mypy is advisory in tools/autoformat.sh (|| true); running it in this local environment reports existing project typing issues in the touched files' surrounding code.
  • Standalone CUDA smoke test using the local GPT layer spec:
    • direct GPTModel.forward(..., output_processor=...) selected-token logprobs match logprobs computed from the default logits path;
    • gradients flow through the custom processor to output_layer.weight;
    • schedule-plan PostProcessNode invokes the threaded processor/context.
  • Simple GPT training run using the hook as the objective boundary:
    • 2-layer local GPT, batch size 4, sequence length 8, selected-token cross entropy computed entirely inside output_processor;
    • verified the hook loss matches loss computed from the default logits path before training;
    • ran 6 AdamW optimizer steps and confirmed the selected-token loss decreased: 4.2114 -> 2.4986.
  • Tiny Hugging Face text smoke run using Salesforce/wikitext, wikitext-2-raw-v1, train[:64]:
    • GPT-2 tokenizer, 2-layer local Megatron GPT, batch size 4, sequence length 16;
    • selected-token cross entropy computed entirely inside output_processor;
    • verified hook loss exactly matched loss computed from the default logits path before training: 10.833719 vs 10.833719;
    • ran 10 AdamW optimizer steps and confirmed selected-token loss decreased: 10.8337 -> 10.3881.

Full pytest collection was not run locally because this environment is missing omegaconf and transformer_engine.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Glitchfix Glitchfix force-pushed the feat/4590-gpt-postprocess-hook branch from 1eda379 to ab3bdb5 Compare May 7, 2026 23:04
Add a keyword-only output_processor and context to GPTModel.forward so downstream RL callers can consume decoder hidden states and output-layer helpers without monkey-patching GPTModel postprocessing.

Thread the same hook through schedule-plan postprocess nodes so 1F1B schedule users can rely on the same extension point, and add coverage for direct forward invocation plus schedule-plan threading.

Addresses NVIDIA#4590.

Signed-off-by: Shivanjan Chakravorty <shivanjanc@nvidia.com>
@Glitchfix Glitchfix force-pushed the feat/4590-gpt-postprocess-hook branch from ab3bdb5 to bbb5e60 Compare May 8, 2026 21:54
@Glitchfix Glitchfix marked this pull request as ready for review May 8, 2026 22:05
@Glitchfix Glitchfix requested review from a team as code owners May 8, 2026 22:05
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team May 8, 2026 22:06
@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label May 9, 2026
Copy link
Copy Markdown
Contributor

@jon-barker jon-barker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Final Review PR is in the "final review" stage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants