Add GLM 4.7 SPD sidecar acceleration by i386 · Pull Request #866 · Mesh-LLM/mesh-llm

i386 · 2026-06-17T19:50:44Z

Goal

Add a GLM 4.7 SPD sidecar path and publish the first trained sidecar artifact so we can compare vanilla GLM decode against verified GLM+SPD-sidecar decode on SPEED-Bench prompts.

This PR is the canonical feature branch for GLM 4.7 SPD sidecar acceleration. It supersedes the compact donor work in #860 after review. #859 remains useful proof archaeology, but this PR deliberately does not transplant its broad live-serving/protocol changes.

Published Artifact

Model repo: https://huggingface.co/meshllm/glm-4.7-flash-spd-sidecar

Hub revision: 9aad350802f697c42d5001d1e05e6c7cc1c530e9

The repo contains the exported sidecar plus reproduction artifacts:

train/spd-head.safetensors
train/speculation_head_final.pt
train/skippy-spd-head.json
data/draft_vocab_top_16000.json
eval/summary/pipeline_eval__train__speculation_head_final__nt12__summary.json
repro/train.sh
repro/export.sh
repro/manifest-validation.log

Exported safetensors:

field	value
format	`safetensors-spd-head-v1`
dtype	`F16`
tensors	19
size	321,399,280 bytes
sha256	`d9f9d47728d4e3093b272feeb739532452c6780e0fd45a60cc7d62e853c1cdd2`

Training Result

Overnight run on micstudio:

uv run --script evals/spd/hf_train_eval_qwen06.py \
  --work-dir /tmp/skippy-spd-glm47-overnight8k \
  --model-name /Volumes/models/huggingface/hub/models--zai-org--GLM-4.7-Flash/snapshots/7dd20894a642a0aa287e9827cb1a1f7f91386b67 \
  --dataset HuggingFaceH4/ultrachat_200k \
  --dataset-split train_sft \
  --train-rows 8192 \
  --eval-rows-per-set 4 \
  --num-stages 3 \
  --stage-layer-boundaries 15,31,47 \
  --num-spec-layers 1 \
  --epochs 1 \
  --max-length 512 \
  --max-new-tokens 64 \
  --batch-size 1 \
  --gradient-accumulation-steps 8 \
  --learning-rate 2e-5 \
  --warmup-steps 50 \
  --save-steps 128 \
  --log-interval 20 \
  --build-draft-vocab-size 16000 \
  --draft-vocab-json '' \
  --draft-top-k 1 \
  --attn-implementation sdpa \
  --device mps \
  --upload-repo none

The run loaded 8192 UltraChat rows, filtered to 7377 usable shifted-label rows, and completed 923 optimizer steps.

Verified donor eval over 12 mini prompts:

metric	value
generated tokens	768
decode loop steps	2076
accepted draft flags	109 / 768
acceptance rate	0.3699
equivalent accept length	1.1098
theoretical throughput gain	+11.09%

Summary artifact:

/tmp/skippy-spd-glm47-overnight8k/artifacts/20260618-192253/eval/summary/pipeline_eval__train__speculation_head_final__nt12__summary.json

SPEED-Bench Comparison

This is a bounded SPEED-Bench subset using one prompt from each of the 11 SPEED-Bench categories, max_new_tokens=32, greedy decode, draft_top_k=1, and target verification enabled for the SPD row.

Important caveat: this is the Python donor verified evaluator on SPEED-Bench prompts. It is not yet the production Rust/Skippy OpenAI server executing the safetensors sidecar live. The table separates verified acceptance from wall-clock speed, because the Python donor pipeline still has sequential overhead.

Command:

PYTHONPATH=/private/tmp/skippy-spd-glm47-overnight8k/speculative_pipeline_decoding \
/Users/micn/.cache/uv/environments-v2/hf-train-eval-qwen06-51e39d356c3e90ad/bin/python3 \
  /tmp/skippy-spd-glm47-speedbench-subset/eval_speed_bench.py \
  --spec_head_ckpt /tmp/skippy-spd-glm47-overnight8k/artifacts/20260618-192253/train/speculation_head_final.pt \
  --base_model_path /Volumes/models/huggingface/hub/models--zai-org--GLM-4.7-Flash/snapshots/7dd20894a642a0aa287e9827cb1a1f7f91386b67 \
  --data_dir /tmp/skippy-spd-glm47-speedbench-subset/data \
  --output_dir /tmp/skippy-spd-glm47-speedbench-subset/eval \
  --gpus 0 \
  --max_new_tokens 32 \
  --temperature 0.0 \
  --draft_top_k 1 \
  --baseline \
  --baseline_cache_dir /tmp/skippy-spd-glm47-speedbench-subset/baseline

Result summary:

row	prompts	tokens	decode tok/s	speedup vs vanilla	accepted flags	acceptance	equiv. accept length	theoretical gain
Vanilla GLM 4.7 Flash HF generate	11	352	6.99	1.000x	n/a	n/a	n/a	n/a
GLM 4.7 Flash + verified SPD sidecar	11	352	3.29 wall / 6.53 ideal	0.471x wall / 0.935x ideal	59 / 352	0.3789	1.1367	+14.10%

Summary artifact:

/tmp/skippy-spd-glm47-speedbench-subset/eval/summary/pipeline_eval__train__speculation_head_final__nt11__summary.json

Conclusion: the sidecar now has meaningful verified acceptance (EAL=1.1367 on SPEED-Bench subset; EAL=1.1098 on the 12-prompt mini eval), but live wall-clock acceleration still needs Rust/Skippy sidecar execution rather than the donor Python pipeline.

Validation

Python compile:

python3 -m py_compile \
  evals/spd/glm47_frontload.py \
  evals/spd/hf_train_eval_qwen06.py \
  evals/spd/export_spd_head.py \
  evals/spd/simulate_latency.py

Skippy SPD manifest validation:

SKIPPY_SPD_MANIFEST=/tmp/glm-4.7-flash-spd-sidecar-hub/train/skippy-spd-head.json \
  cargo test -p skippy-runtime --features dynamic-native-runtime \
  validates_external_manifest_when_skippy_spd_manifest_is_set

Result:

running 1 test
test spd::tests::validates_external_manifest_when_skippy_spd_manifest_is_set ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 64 filtered out; finished in 0.70s

Earlier local validation on this branch:

cargo test -p skippy-runtime --lib spd
cargo fmt --all -- --check
cargo check -p skippy-runtime
cargo clippy -p skippy-runtime --all-targets -- -D warnings
git diff --check

cargo test -p skippy-runtime --lib spd: 11 passed, 0 failed.

What changed

Adds SPD_SKIPPY_PROJECT.md rewritten around the GLM 4.7 SPD sidecar hypothesis.
Adds docs/design/GLM47_SPD_EXECUTION_PLAN.md with the GLM sidecar training/export/eval plan.
Adds evals/spd/ scripts for GLM checkpoint inspection, reference SPD training/eval, safetensors export, and latency simulation.
Adds skippy-runtime::spd manifest/checkpoint/safetensors validation.
Fixes the reduced-vocab zero-loss pathology in the donor trainer so GLM SPD training uses usable assistant-labeled shifted targets.
Exposes trainer schedule controls needed for short smoke runs and overnight checkpointed training.

Deliberately not included

No changes to codex/skippy-spd-proof.
No stale Add GLM 4.7 SPD sidecar acceleration #860 llama patch-queue edits.
No broad WIP: Skippy SPD proof handoff #859 Skippy serving/protocol/tap-transport code.
No claim yet that Rust/Skippy live serving executes the safetensors sidecar.

Next Work

Wire the exported safetensors sidecar into live Skippy/Rust serving, then rerun the same SPEED-Bench comparison through the production OpenAI-compatible path instead of the Python donor evaluator.

…ma-patches

coderabbitai · 2026-06-17T19:50:53Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1cae6749-7220-4e4e-b6b1-2aa782060299

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jd/jianyang-spd-on-mtp

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

i386 added 30 commits June 5, 2026 17:40

Add Skippy cross-request token decode batching

2c85ee4

Batch Skippy split decode frames across requests

5980d56

Avoid fixed decode batch rendezvous waits

bb3af8f

Document micstudio-first lab startup

59d6b25

Keep lab startup note in skill only

ff12eed

Carry split GGUF tensor spooling patch

abd7e71

Document native MTP Skippy architecture

b71f8ae

Add native MTP n1 verification scaffold

876a9f6

Add GLM DSA native MTP graph patch

250ddd5

Add native MTP n1 decode ABI

ac37d00

Wire native MTP n1 sidecar drafts

c5ca8b9

Teach correctness harness native MTP sideband

c411e9b

Gate correctness on native MTP drafts

2e1d0e9

Document GLM 5.1 native MTP proof gate

3e3d88d

Pivot native MTP proof gate to GLM 4.7

3068cde

Record GLM 4.7 MTP artifact gap

967e34c

Point GLM 4.7 MTP gate at meshllm artifact

c044ab8

Preflight native MTP correctness artifacts

b15f44f

Preserve fused native MTP drafts

1f3e958

Verify native MTP n1 correctness

2c78ee8

Cover native MTP direct return replies

caec7cd

Support DeepSeek2 GLM MTP n1 split drafts

59e9ef4

Add native MTP batched verification path

2d8f702

Fix GLM native MTP batched verification

1b73777

Gate native MTP batched verification

57d8cb0

Add native MTP OpenAI A/B correctness check

b8fd760

Compare native MTP against greedy baseline

9aea3ca

Prepare native MTP correctness for lab endpoints

8945b14

Add remote stage1 native MTP correctness launch

d68a705

Trace lab split stage readiness

bfc0802

i386 added 12 commits June 17, 2026 19:39

Add split VerifySpan timing diagnostics

99c1176

Expose VerifySpan overhead breakdown

8bb94eb

Avoid copying VerifySpan activation inputs

6539d1c

Expose batched MTP proposal timing

ebbf648

Add VerifySpan native timing diagnostics

956663d

Break down VerifySpan MTP sync timing

b1b72c3

Split VerifySpan MTP sync setup timing

aa748d3

Add Skippy greedy sampling fast path

a8a906c

Merge remote-tracking branch 'origin/main' into feat/jianyang-glm-lla…

7c3e3cc

…ma-patches

Trim losing MTP verifier diagnostics

b94cc76

Trim native MTP review path

e5704a3

Add GLM SPD-on-MTP experiment path

fb4b6e8

i386 mentioned this pull request Jun 17, 2026

Add GLM 4.7 SPD sidecar acceleration #860

Closed

i386 added 10 commits June 18, 2026 06:21

Fix GLM SPD training assistant masks

3bb71cc

Add numpy to SPD export script deps

9fe4f97

Add generic SPD topology planning

0ba1f5d

Add generic layer-tap SPD sidecar

96d258e

Harden generic GLM SPD quality gate

d6d6625

Add reusable SPD layer-tap example cache

f7b3876

Reuse generic SPD caches across draft widths

7ecf25c

Add SPD top-k quality diagnostics

806dafe

Add fixed layer tap SPD controls

fd19f3d

Batch GLM SPD hidden state extraction

e168ced

Base automatically changed from feat/jianyang-glm-llama-patches to main June 18, 2026 05:40

i386 changed the title ~~Add GLM 4.7 SPD-on-MTP experiment path~~ Add GLM 4.7 SPD sidecar acceleration Jun 18, 2026

i386 added 3 commits June 18, 2026 18:54

Fix GLM SPD reduced-vocab training loss

7ec4be5

Filter GLM SPD rows without next-token labels

af94b35

Expose GLM SPD trainer schedule controls

6e1ba0d

i386 mentioned this pull request Jun 20, 2026

Add native skippy-quantize conversion and quantization CLI #886

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GLM 4.7 SPD sidecar acceleration#866

Add GLM 4.7 SPD sidecar acceleration#866
i386 wants to merge 107 commits into
mainfrom
jd/jianyang-spd-on-mtp

i386 commented Jun 17, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

i386 commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Published Artifact

Training Result

SPEED-Bench Comparison

Validation

What changed

Deliberately not included

Next Work

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

i386 commented Jun 17, 2026 •

edited

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading