[vLLM-ATOM] Fix GLM-4.7 MTP in vLLM plugin by kliuae · Pull Request #805 · ROCm/ATOM

kliuae · 2026-05-15T14:00:03Z

Motivation

In the current vLLM-ATOM upstream, GLM-4.7 crashes with arguments like lm_head and embed_tokens not found and conflicting compile cache key, when MTP is enabled. This PR adds fixes to MTP to address these issues.

Technical Details

Expose speculative decode attributes, which is fine for qwen3next mtp too as its lm_head lives in model, not inner
Mask inputs for glm4-mtp
Separate main model and draft model configs so the draft model doesn't load from the main model's compiled artifact

Test Plan

lm_eval with gsm8k

Test Result

Server command

ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=1 \
AITER_QUICK_REDUCE_QUANTIZATION=INT4 \
SAFETENSORS_FAST_GPU=1 \
ATOM_DISABLE_VLLM_PLUGIN=0 \
ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=0 \
VLLM_LOGGING_LEVEL=INFO \
VLLM_USE_V1=1 \
VLLM_ROCM_USE_AITER=1 \
vllm serve zai-org/GLM-4.7-FP8 \
  -tp 4 \
  --gpu-memory-utilization 0.9 \
  --no-enable-prefix-caching \
  --disable-uvicorn-access-log \
  --trust-remote-code \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --kv-cache-dtype fp8 \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 1

Eval command

lm_eval --model local-completions   --model_args model=zai-org/GLM-4.7-FP8,base_url=http://localhost:8000/v1/completions,num_concurrent=64,tokenized_requests=False  --tasks gsm8k --num_fewshot 5

Eval score

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	_	0.9462	_	0.0062
		strict-match	5	exact_match	_	0.9454	_	0.0063

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

fix glm4 mtp in vllm plugin

bd95088

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vLLM-ATOM] Fix GLM-4.7 MTP in vLLM plugin#805

[vLLM-ATOM] Fix GLM-4.7 MTP in vLLM plugin#805
kliuae wants to merge 1 commit into
ROCm:mainfrom
kliuae:kliuae/plugin_fix_glm4_mtp_merge

kliuae commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kliuae commented May 15, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant