Skip to content

[Draft]feat(deepseek-v4): support MTP speculative decoding#123

Closed
dongjiyingdjy wants to merge 2 commits into
mainfrom
pr-stack/v4-pr6-mtp
Closed

[Draft]feat(deepseek-v4): support MTP speculative decoding#123
dongjiyingdjy wants to merge 2 commits into
mainfrom
pr-stack/v4-pr6-mtp

Conversation

@dongjiyingdjy
Copy link
Copy Markdown
Contributor

Summary

  • add DeepSeek V4 MTP draft model support for multi-step speculative decoding
  • wire DeepSeek V4 target/draft attention metadata and cache layout for MTP
  • stabilize CUDA graph metadata refresh across multi-step draft decode
  • document the exec ts serve DeepSeek V4 MTP startup path

Dependency

  • This PR is stacked on pr-stack/v4-pr5-reapply and should be merged after the PR5 mixed prefill/decode PR.

Validation

  • pre-commit run --all-files
  • pytest test/runtime/test_deepseek_v4_config.py -q (89 passed, 2 skipped)
  • DeepSeek V4 MTP server smoke test via exec ts serve, including /v1/models and /v1/completions
  • GSM8K limit 50: flexible-extract exact_match 0.94, strict-match exact_match 0.94

dongjiyingdjy and others added 2 commits May 13, 2026 12:18
Signed-off-by: jiyingd <87510204+dongjiyingdjy@users.noreply.github.com>
Co-authored-by: jiyingd <87510204+dongjiyingdjy@users.noreply.github.com>

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: jiyingd <87510204+dongjiyingdjy@users.noreply.github.com>
@dongjiyingdjy dongjiyingdjy requested a review from a team as a code owner May 13, 2026 13:08
@dongjiyingdjy dongjiyingdjy changed the title feat(deepseek-v4): support MTP speculative decoding [Draft]feat(deepseek-v4): support MTP speculative decoding May 13, 2026
Base automatically changed from pr-stack/v4-pr5-reapply to main May 14, 2026 08:47
@lightseek-bot
Copy link
Copy Markdown
Contributor

ref #207

@lightseek-bot lightseek-bot deleted the pr-stack/v4-pr6-mtp branch May 22, 2026 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants