Skip to content

Deepseek v4 Support#1195

Draft
LiJunscs wants to merge 2 commits into
flagos-ai:mainfrom
LiJunscs:deepseek_v4
Draft

Deepseek v4 Support#1195
LiJunscs wants to merge 2 commits into
flagos-ai:mainfrom
LiJunscs:deepseek_v4

Conversation

@LiJunscs
Copy link
Copy Markdown
Collaborator

@LiJunscs LiJunscs commented May 7, 2026

PR Category

[Train] Most of codes are copied from Megatron-LM Dev branch. The dev branch is different with main branch or release version.
Megatron LM PR:
DeepSeek-V4:
NVIDIA/Megatron-LM#4458
NVIDIA/Megatron-LM#4481
NVIDIA/Megatron-LM#4518
mHC:
NVIDIA/Megatron-LM#2943

PR Types

[New features]

PR Description

Add DeepSeek V4 model into FlagScale and Megatron-FL
Supported:

  1. CSA and HCA
  2. Hash Router
  3. mHC
  4. Engram(optional)

Unsupported:

  1. Sqrtsoftpuls router score function.
  2. mHC recompute.
  3. Overlap_grad_reduce and overlap_param_gather when Zero 1.
  4. Any infra optimizations.

NOTE: This is only a draft pr, please reivew to give more suggestions.

such as:

  1. File structure.
  2. Code implements.

Next plan:

  1. Distributed training.
  2. Muon optimizer with Zero 1 adaptation.
  3. Low precision is out of scope of this pr, limited by resource.
  4. Maybe context parallel for sparse attention.
  5. Welcome to give more suggestions.

@LiJunscs LiJunscs changed the title Deepseek v4 Deepseek v4 Support May 7, 2026
@LiJunscs LiJunscs self-assigned this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant