Skip to content

Support new arguments for exllamav3 Qwen3.5/3.6 MTP#421

Draft
alexhunt7 wants to merge 2 commits into
theroyallab:mainfrom
alexhunt7:mtp
Draft

Support new arguments for exllamav3 Qwen3.5/3.6 MTP#421
alexhunt7 wants to merge 2 commits into
theroyallab:mainfrom
alexhunt7:mtp

Conversation

@alexhunt7
Copy link
Copy Markdown

@alexhunt7 alexhunt7 commented May 8, 2026

Is your pull request related to a problem? Please describe.
Support new arguments for exllamav3 Qwen3.5/3.6 MTP support.

Also adds draft acceptance metrics to the logs (in a separate commit).

Why should this feature be added?
MTP speeds things up tremendously. This supports adds support for configuring MTP in conjunction with turboderp-org/exllamav3#206

Examples
~2x token generation speed in Qwen3.6 27B on my 5070ti + 3060ti.

Additional context
Keeping as a draft until turboderp-org/exllamav3#206 is merged, but this is otherwise ready to go.

alex-hunt-materialize and others added 2 commits May 8, 2026 15:33
Adds draft_arch_override and num_draft_tokens to DraftModelConfig so
Qwen3.5/3.6 BF16 directories can be loaded as MTP-only draft models
(arch_override="Qwen3_5MTPDraftModel"). Threads both options through to
Config.from_directory and AsyncGenerator.

If draft_arch_override is set but draft_model_name is omitted, treat the
main model_directory as the source for the draft model. This covers the
case where the same checkpoint contains both the trunk and the mtp.*
tensors — no need to point at a separate directory or extract the MTP
head into its own dir.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants