Skip to content

[vLLM-ATOM] Fix GLM-4.7 MTP in vLLM plugin#805

Open
kliuae wants to merge 1 commit into
ROCm:mainfrom
kliuae:kliuae/plugin_fix_glm4_mtp_merge
Open

[vLLM-ATOM] Fix GLM-4.7 MTP in vLLM plugin#805
kliuae wants to merge 1 commit into
ROCm:mainfrom
kliuae:kliuae/plugin_fix_glm4_mtp_merge

Conversation

@kliuae
Copy link
Copy Markdown
Contributor

@kliuae kliuae commented May 15, 2026

Motivation

In the current vLLM-ATOM upstream, GLM-4.7 crashes with arguments like lm_head and embed_tokens not found and conflicting compile cache key, when MTP is enabled. This PR adds fixes to MTP to address these issues.

Technical Details

  • Expose speculative decode attributes, which is fine for qwen3next mtp too as its lm_head lives in model, not inner
  • Mask inputs for glm4-mtp
  • Separate main model and draft model configs so the draft model doesn't load from the main model's compiled artifact

Test Plan

lm_eval with gsm8k

Test Result

Server command

ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=1 \
AITER_QUICK_REDUCE_QUANTIZATION=INT4 \
SAFETENSORS_FAST_GPU=1 \
ATOM_DISABLE_VLLM_PLUGIN=0 \
ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=0 \
VLLM_LOGGING_LEVEL=INFO \
VLLM_USE_V1=1 \
VLLM_ROCM_USE_AITER=1 \
vllm serve zai-org/GLM-4.7-FP8 \
  -tp 4 \
  --gpu-memory-utilization 0.9 \
  --no-enable-prefix-caching \
  --disable-uvicorn-access-log \
  --trust-remote-code \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --kv-cache-dtype fp8 \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 1

Eval command

lm_eval --model local-completions   --model_args model=zai-org/GLM-4.7-FP8,base_url=http://localhost:8000/v1/completions,num_concurrent=64,tokenized_requests=False  --tasks gsm8k --num_fewshot 5

Eval score

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match _ 0.9462 _ 0.0062
strict-match 5 exact_match _ 0.9454 _ 0.0063

Submission Checklist

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant