fix(smoke): template the --probe-smoke seed; fixes false BrokenPunctLoop on Gemma4 unified 12B 4-bit (#121) by Pushkinist · Pull Request #124 · Pushkinist/rMLX

Pushkinist · 2026-06-17T11:03:16Z

Closes #121.

Finding: the model was never broken — the smoke probe was

Issue #121 reported Gemma4 unified 12B emitting degenerate output
(BrokenPunctLoop repeating '1', id 236770) under 4-bit weight quant, while
the mxfp8 build of the same model passed. The "4-bit dequant mis-handles a
unified weight" premise was falsified by an mlx-lm oracle bisect:

On the identical bare seed prompt ("What is the capital of France?", no
chat turn markers), the mlx-lm reference loader produces the same step-0
top token — id 236770 '1', logit 26.38 — as rMLX. So rMLX's 4-bit forward is
numerically faithful; there is no divergence.
The mxfp8 build also degenerates on that bare prompt — it just lands on ./_
filler that slips past the BrokenPunctLoop classifier, whereas the 4-bit
variants land on '1' which trips it. Pure quant-noise token selection.
Served via the chat template, all variants (4bit / mxfp4 / mxfp8) answer
"The capital of France is Paris." correctly — and always did.

Root cause: --probe-smoke seeded an out-of-distribution bare instruction
(no turn structure) into an instruction-tuned chat model. The verdict, not the
model, was wrong.

Fix (general)

Render the smoke seed through the snapshot's own chat_template.jinja when
present (production-shaped, turn-structured input — byte-identical to the real
chat path: add_generation_prompt: true, same BOS source, add_special_tokens=false),
falling back to the bare seed for base/non-chat snapshots. The classifier is
unchanged, so genuinely broken models are still rejected.

New rmlx_server::chat_template::smoke_prompt_ids → Option<Vec<u32>>
(templated path only; None when no usable template).
run_smoke_probe gained an optional caller-supplied prompt-ids override so
rmlx-models stays free of the template-engine dep.
Both probe entry points (CLI info, server --require-smoke-probe) feed the
templated prompt; on None each uses its own canonical BOS resolver
(bos_token → <bos> → <|im_start|> → eos_token → <|endoftext|>) — no hardcoded
<bos>/magic id (review fix).
Silent template-render fallback now emits a tracing::debug! with the reason
(traceability hard rule).

Proof (real models, single-MLX)

rmlx info --probe-smoke:

snapshot	arch	quant	before	after
`gemma-4-12B-it-qat-4bit`	unified	affine-4bit	`BrokenPunctLoop{1}`	Ok → "…Paris."
`gemma-4-12B-it-qat-mxfp4`	unified	mxfp4	`BrokenPunctLoop{1}`	Ok → "…Paris."
`gemma-4-12B-it-mxfp8`	unified	mxfp8	Ok (benign token)	Ok → "…Paris"
`gemma-4-E4B-it-qat-4bit`	gemma4	affine-4bit	Ok	Ok → "…Paris"
`gemma-4-26B-A4B-it-qat-4bit`	gemma4 MoE	affine-4bit	Ok	Ok → "…Paris"

All controls now produce coherent Paris-class output (stronger than the prior
benign-token-by-luck pass). No regression. make lint (-D warnings) and
make test green; reviewed by the rust-reviewer agent (BOS + traceability
findings fixed in the second commit).

🤖 Generated with Claude Code

… on gemma4-unified 4-bit The --probe-smoke heuristic fed the model a bare instruction prompt with no turn markers. Instruction-tuned snapshots can degenerate into a repeated filler token on such input — the mlx-lm reference loader reproduces this identically, and the mxfp8 gemma-4-12B build degenerates the same way (to '.'/'_'). The QAT 4-bit unified 12B snapshots happen to land on '1', which the classifier flags as BrokenPunctLoop, while the served chat-template path generates correctly ("The capital of France is Paris."). This was a false positive in the probe, not a 4-bit dequant bug: rMLX's step-0 logits match the oracle to within quant noise (top id 236770 '1', |max|~28.4 vs 28.5). The per-tensor mixed 4/8-bit quant overrides are resolved correctly. Fix generally: render the fixed smoke seed through the snapshot's real chat_template.jinja when present (production-shaped, turn-structured input), falling back to the bare seed for base/non-chat snapshots. New chat_template::smoke_prompt_ids owns this; both the rmlx info probe and the server --require-smoke-probe gate use it. run_smoke_probe gains an optional caller-supplied prompt-ids override so rmlx-models stays free of the template engine dep. - crates/rmlx-server/src/chat_template.rs: smoke_prompt_ids + templated builder - crates/rmlx-models/src/arch/loader.rs: run_smoke_probe prompt-ids override - crates/rmlx-server/src/openai/state.rs: render templated probe prompt at gate - crates/rmlx-cli/src/commands/info.rs: use templated smoke_prompt_ids - chat_template_tests.rs: template-path + bare-seed fallback unit tests - docs/CLI.md, docs/MODELS.md: document the templated probe + 4-bit text status

…fallback The server --require-smoke-probe bare-seed fallback computed BOS as token_to_id("<bos>").unwrap_or(2), a hardcoded string + magic id that seeded the probe wrong for chat-template-less vocabs lacking <bos> (and, passed as Some(ids), suppressed run_smoke_probe's own resolve_bos_id). smoke_prompt_ids now returns Option<Vec<u32>> from the chat-template render path only; when no usable template exists it returns None so each entry point uses its own canonical BOS resolver: run_smoke_probe::resolve_bos_id (server) and load_bos_id + arch::smoke_prompt_ids (CLI info). No token id is invented in chat_template.rs. Per the traceability rule, the templated->bare-seed fallback now emits a debug! event carrying the failure reason (load/compile/render/encode/empty), so a run's .jsonl shows whether the probe ran templated or fell back and why. Also fix a stale doc reference (ChatTemplate::from_template_string -> ChatTemplate::new) and update the None-path unit test to assert no magic id 2 leaks.

Pushkinist added 2 commits June 17, 2026 17:40

Pushkinist merged commit e1bf8a3 into main Jun 17, 2026
2 checks passed

Pushkinist deleted the fix/121-gemma4-unified-4bit-degenerate branch June 17, 2026 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(smoke): template the --probe-smoke seed; fixes false BrokenPunctLoop on Gemma4 unified 12B 4-bit (#121)#124

fix(smoke): template the --probe-smoke seed; fixes false BrokenPunctLoop on Gemma4 unified 12B 4-bit (#121)#124
Pushkinist merged 2 commits into
mainfrom
fix/121-gemma4-unified-4bit-degenerate

Pushkinist commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pushkinist commented Jun 17, 2026

Finding: the model was never broken — the smoke probe was

Fix (general)

Proof (real models, single-MLX)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant