feat: enable MiniCPM5-1B (explicit head_dim + special-token decode + tojson kwarg + tool parser) by dusterbloom · Pull Request #176 · panbanda/higgs

dusterbloom · 2026-06-02T12:43:40Z

Summary

Takes openbmb/MiniCPM5-1B (MLX 4-bit) from loads-but-emits-garbage to fully working chat and tool-calling on higgs. Four self-contained commits — three are general fixes that help any model, one adds the MiniCPM tool-call format.

Stacked on #164 (streaming tool-call deltas + Qwen XML). The first 5 commits here belong to #164 — please review those there. This PR adds the 4 commits listed below; once #164 merges, only those remain in this diff.

The 4 new commits

feat(models): honor explicit head_dim — ModelArgs::head_dim() always computed hidden_size / num_attention_heads, ignoring an explicit head_dim in config.json. MiniCPM5 sets head_dim=128 while 1536/16=96, so it loaded but produced garbage (wrong RoPE dimension + 1/√head_dim scale). Now honors the config value. Regression-safe: every other Llama-family config omits head_dim → None → behaviour unchanged. General fix for any explicit-head_dim checkpoint.
fix(engine): preserve content special tokens on decode — decode_tokens used skip_special_tokens=true, deleting content-bearing special tokens. MiniCPM5 encodes its tool-call markup (<function>/<param>, token ids 18–21) as special tokens, so calls silently vanished into empty content. Now decodes with specials kept and strips only a precomputed control set (EOS ids + <|…|> delimiters + <s>/</s>/…). Plain text has no special tokens and decodes identically, so normal responses are unaffected. Also fixes latent control-token leakage for other models.
fix(chat_template): accept ensure_ascii kwarg in tojson — HF templates call tojson(ensure_ascii=false) (MiniCPM5 at chat:6); the custom filter rejected kwargs and 500'd any tools-bearing request. Now accepts (and ignores) them — serde_json::to_string already emits UTF-8, i.e. ensure_ascii=false. Templates calling tojson with no args are unaffected.
feat(tool_parser): MiniCPM5 tool format — a third tool-call format (attribute-named, optional <![CDATA[…]]> values, no <tool_call> wrapper), dispatched on content shape. CDATA-aware end detection (a literal </function> inside a value can't close the block early); the streaming tracker's flag becomes an Inside { None, ToolCall, Function } enum (the existing arms unchanged). Reuses the ToolSchema coercion from feat(chat): streaming tool-call deltas + Qwen-friendly arg normalisation #164. 6 new unit tests.

Verification (release builds)

cargo test --release — higgs-engine 258, higgs-models 457, higgs 98; 0 failed. cargo clippy/cargo fmt clean across touched crates.
Live e2e on mlx-community/MiniCPM5-1B-4bit:
- plain chat coherent ("Paris is the capital of France." — was "I cannot answer: 1." before the head_dim fix)
- tool call → {"name":"get_weather","arguments":{"city":"London"}}, finish_reason: tool_calls, no markup leak — non-streaming and streaming.
Regression on Qwen3.5-4B-MLX-4bit: tool calls still parse to structured tool_calls; the decode change also removes a stray <|im_end|> that could leak into content.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Expanded tool-call parsing to accept additional tool formats and emit structured tool-call events during streaming
- Tool-call values are now normalized for templates so rendered tool metadata is consistent
- Thinking-mode detection improved to enable model "thinking" behavior from templates or model config
Bug Fixes
- Token decoding preserves visible tool/markup tokens while removing system markers and delimiters
- Streaming parsing now recovers from unclosed/oversized tool blocks instead of hanging
Improvements
- Added explicit head_dim configuration for model parameters

The streaming `/v1/chat/completions` route previously stripped tools from the prompt (`prepare_chat_prompt_with_thinking(..., None, ...)`) and warned that "tool_calls deltas are unsupported". The model, deprived of tool context, would hallucinate fake tool calls as plain text — and the client's TTS pipeline would happily speak `<nanobot>read_skill newsreader</nanobot>` out loud. This commit completes the streaming tool-call story that was documented but never finished (see closed upstream PR panbanda#63). Three pieces, ~580 net new lines: 1. `tool_parser::StreamingToolCallTracker` New `pub struct` next to the existing `parse_tool_calls`. A small state machine that buffers streamed text chunks and extracts complete `<tool_call>{json}</tool_call>` blocks on the fly. When `active=false` (no tools in the request) it collapses to a single-allocation passthrough with zero parsing cost. Invariants verified by 8 new unit tests: - inactive=false → pure passthrough - single complete tag in one chunk - tag split across N chunks → reassembled - text before AND after tags → both visible - invalid JSON inside tag → preserved as visible (no silent loss) - unclosed tag at flush → buffered prefix emitted as visible - UTF-8 char-boundary safety on tail flush - multiple calls with text between → indices tracked correctly 2. `chat::chat_completions_stream` wiring - Pass `req.tools.as_deref()` (not None) to `prepare_chat_prompt_with_thinking` so the chat template renders the tool spec the model recognises. - Construct a `StreamingToolCallTracker` keyed off `stream_includes_tools`. - On every chunk: route the reasoning-tracker's visible text through the tool tracker; emit `ToolCallDelta` SSE events for each completed call; emit content delta for the surviving visible text. - Defer `finish_reason` until after the tool tracker has drained, so we report `"tool_calls"` when the response actually contained any. - Final flush drains both trackers so no tokens vanish silently. 3. `chat_template::normalize_tool_call_for_template` New `pub fn` that walks a tool-call JSON value and (a) hoists `function.{name,arguments}` to the top level — Qwen's `chat_template.jinja` references `tool_call.name` and `tool_call.arguments` directly, not the OpenAI-nested shape; (b) parses string-encoded `arguments` to a JSON value, fixing the `cannot convert value into pairs (in chat:120)` minijinja error that crashed multi-turn conversations carrying assistant tool_calls in their history. Verified by 4 new unit tests covering OpenAI shape, Qwen-flat shape (no-op), unparseable-string arguments (kept as string), and non-object inputs (no-op). `convert_messages` in the chat route calls the normaliser per tool call so both the streaming and non-streaming paths get the fix. Test impact: - higgs-models: 242 passed (no change) - higgs-engine: 338 passed (+27 tracker tests, +4 normalize tests, 25 ignored) - higgs: 457 passed (no regression on existing convert_messages tests) - cargo clippy -Dwarnings: clean across all three crates

…bind Qwen's chat_template.jinja:107-108 rebinds the loop variable: {%- for tool_call in message.tool_calls %} {%- if tool_call.function is defined %} {%- set tool_call = tool_call.function %} {%- endif %} After this rebind, line 120's `tool_call.arguments|items` walks into the ORIGINAL `function.arguments` JSON-encoded string — bypassing the top-level `arguments` we just normalised. minijinja raises "cannot convert value into pairs" and the request 500s. Two-part fix: 1. `normalize_tool_call_for_template` now normalises BOTH the hoisted top-level `arguments` AND the still-nested `function.arguments`. Pulled the parse-and-coerce logic into a private helper `normalize_arguments_value` so it's applied identically at both sites (DRY). 2. The coercion now defends against every non-mapping shape — null, bool, number, array, string-that-fails-to-parse — by replacing the value with an empty `{}`. A warn is logged so pathological shapes stay visible. Previously only string-parsing was attempted and any other shape leaked through to the template. New invariant test (`normalize_handles_qwen_rebind_to_function`) pins both `tool_call.arguments` AND `tool_call.function.arguments` to be mappings after normalize, so future edits can't regress the rebind path silently. Five additional coercion tests cover null / array / number / unparseable-string / array-as-string cases. Verified by manual smoke: `nanobot agent -l -m "Use your tools to list files in /tmp"` previously 500'd with the template error; now returns a real directory listing streamed back through `StreamingToolCallTracker`. Test count: chat_template 36 (was 31, +5 net for the new shapes), all green. Clippy `-Dwarnings` clean across higgs / higgs-engine / higgs-models. Also: cargo fmt sweep on the new code that landed unformatted in the prior commit; pure layout, no behaviour change.

`req.tools.as_deref()` returns `Some(&[])` when the request carries an empty `tools` array. The chat-template renderer treats that as `tools` being *defined* in the Jinja context (just empty), versus `None` which omits the key entirely. Templates branching on `{% if tools is defined %}` vs `{% if tools %}` see different state. Convert empty slices to `None` before handing them to `prepare_chat_prompt_with_thinking`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Addresses CRITICAL finding from the closed upstream PR panbanda#63 review: a model that emits `<tool_call>` and never produces a matching `</tool_call>` would grow `StreamingToolCallTracker::buffer` until OOM — a model-controlled DoS vector. On overflow we abandon the parse, surface `<tool_call>` plus the buffered bytes as visible content (preserving the "never silently drop tokens" invariant), and reset state so a subsequent well-formed tool call in the same stream still parses. New test `streaming_unbounded_buffer_capped_and_recovers` pushes MAX_INSIDE_TOOL_CALL_BYTES + 1 inside an unclosed tag followed by a valid call, and asserts both that the overflow is flushed verbatim and that the post-overflow call still parses (completed_count == 1). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… coercion Qwen3.5/3.6 models' chat_template.jinja instructs them to emit `<tool_call><function=NAME><parameter=KEY>value</parameter></function></tool_call>`, but the existing parser only decodes the older JSON-in-`<tool_call>` shape, so tool calls leaked into `content` with `finish_reason: stop` and the OpenAI streaming/non-streaming clients saw nothing. Add a second parse path that dispatches on the block's first token (`<function=` → XML, else → JSON), reusing the existing `StreamingToolCallTracker` boundary detection unchanged. XML values are raw strings, so coerce them through a `ToolSchema` built from the request's `tools` array (integer/number/boolean/object/array → typed JSON, string-typed `"123"` stays a string), falling back to best-effort JSON parsing when no schema is declared. - `parse_tool_calls` and `StreamingToolCallTracker::new` now take an `Option<&ToolSchema>` / `Option<ToolSchema>`; the streaming path captures it from `req.tools` before the `async_stream!` block, the non-streaming path threads it at the call site - Backward-compatible: pre-existing 28 JSON tests still pass; one new test exercises a mixed JSON+XML stream Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`ModelArgs::head_dim()` always computed `hidden_size / num_attention_heads`, ignoring an explicit `head_dim` in config.json. Models that set a head_dim differing from that ratio (e.g. MiniCPM5-1B: head_dim=128 while 1536/16=96) then loaded but produced garbage — RoPE, the 1/sqrt(head_dim) scale, and head reshaping all used the wrong dimension. Add `head_dim_override: Option<i32>` (serde `head_dim`); `head_dim()` and `checked_head_dim()` return it when present, else the existing ratio. Regression-safe: every other Llama-family config omits `head_dim`, so the field is None and behavior is unchanged. Verified live — MiniCPM5 went from "I cannot answer: 1." to "Paris is the capital of France." Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

decode_tokens used `skip_special_tokens=true`, which strips ALL special tokens — including content-bearing ones. Models that encode tool-call markup as special tokens (e.g. MiniCPM5: `<function>`/`<param>` are token ids 18-21) had the structure deleted before the tool parser could see it, so tool calls silently vanished into empty content. Decode with `skip_special_tokens=false` and instead strip a precomputed set of control tokens: the EOS ids plus the `<|…|>` chat-control delimiters and the classic sentinels (`<s>`, `</s>`, `<unk>`, …). Content tokens like `<think>`, `<tool_call>`, `<function>` are preserved. Plain text has no special tokens and decodes identically, so normal responses are unaffected. Verified: MiniCPM5 tool calls now surface as structured tool_calls; Qwen3.5 regression-clean (its `<|im_end|>` no longer leaks into content). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The custom `tojson` filter took only the piped value, so HF templates that call `tojson(ensure_ascii=false)` (e.g. MiniCPM5 at chat:6) failed prompt rendering with "too many arguments" and 500'd any tools-bearing request. Accept (and ignore) trailing kwargs via `minijinja::value::Kwargs`. `serde_json::to_string` already emits UTF-8, matching ensure_ascii=false, so the value is a no-op. Templates calling `tojson` with no args are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MiniCPM5 emits tool calls as `<function name="NAME"><param name="KEY">VALUE </param></function>` — no `<tool_call>` wrapper, attribute-named, with optional `<![CDATA[…]]>` values. Neither the JSON nor the Qwen `<function=` XML parser handled it, so calls leaked into content. Add a third parse path, dispatched on content shape (a top-level `<function ` opener vs the existing `<tool_call>` wrapper): - `parse_minicpm_function` extracts the name attr and `<param>` entries, with CDATA-aware value extraction; `minicpm_function_end` finds the closing `</function>` while skipping CDATA spans (so a literal `</function>` inside a value can't end the block early). - The streaming tracker's `inside_tool_call: bool` becomes `enum Inside { None, ToolCall, Function }`; the ToolCall arm is unchanged. - Reuses ToolSchema/coerce_param_value for typed argument coercion. 6 new tests (single, multi, CDATA-with-literal-</function>, schema coercion, no-param, streaming split mid-opener+mid-CDATA); all existing JSON/Qwen tests still pass. Verified live e2e on MiniCPM5-1B (non-streaming + streaming). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-02T12:43:53Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3b007317-a333-4900-b24f-fbfa2edee76d

📥 Commits

Reviewing files that changed from the base of the PR and between 348f49a and 9d90be4.

📒 Files selected for processing (4)

crates/higgs-engine/src/chat_template.rs
crates/higgs-engine/src/simple.rs
crates/higgs-engine/src/tool_parser.rs
crates/higgs/src/routes/chat.rs

🚧 Files skipped from review as they are similar to previous changes (2)

crates/higgs-engine/src/chat_template.rs
crates/higgs-engine/src/tool_parser.rs

📝 Walkthrough

Walkthrough

This PR adds multi-shape tool-call parsing with schema-driven coercion and streaming safety, normalizes tool-call JSON for templates, preserves content-bearing markup while filtering control/sentinel tokens during decoding, adds template-aware tojson kwargs, expands thinking-detection heuristics, adds an explicit head_dim override, and integrates these across streaming and non-streaming chat handlers.

Changes

Tool-call ecosystem and chat integration

Layer / File(s)	Summary
Tool-call parsing core and schema `crates/higgs-engine/src/tool_parser.rs`, `tests/tool_parser/*`	Module docs, parse dispatch, XML parsing/coercion, ParamType/ToolSchema and helpers; parse_tool_calls updated to accept optional schema and dispatch by format.
MiniCPM parsing & streaming-safe extraction `crates/higgs-engine/src/tool_parser.rs`, `tests/tool_parser/*`	MiniCPM bare-function parser with CDATA support; StreamingToolCallTracker extended to handle multiple shapes, MAX_INSIDE_TOOL_CALL_BYTES overflow handling, and streaming reassembly tests.
Template normalization & tojson kwargs `crates/higgs-engine/src/chat_template.rs`, `tests/chat_template/*`	Adds `normalize_tool_call_for_template` to hoist `function.{name,arguments}` and parse/coerce string-encoded arguments to objects (fallback `{}`); `tojson` filter accepts Minijinja `Kwargs` (e.g., `ensure_ascii=false`) while retaining serde_json serialization.
Selective token decoding with control-token filtering `crates/higgs-engine/src/simple.rs`	Adds `decode_skip_ids` built from EOS and tokenizer control/sentinel tokens; decodes without token-skip mode and filters skip IDs so content-bearing markup remains visible, while control/sentinel tokens are removed.
Thinking detection and template-marker helper `crates/higgs-engine/src/simple.rs`, `tests/*`	`detect_thinking_support` now considers `enable_thinking` markers found in chat templates or tokenizer_config.json via `chat_template_mentions_enable_thinking`; tests added for marker-based, model-type, and negative detection.
Model configuration with explicit head dimension `crates/higgs-models/src/transformer.rs`, `tests/transformer/*`	Adds `head_dim_override` (`head_dim` in config) to `ModelArgs`; `head_dim()` and `checked_head_dim()` honor the override and validate positivity; tests verify override and fallback behavior.
Chat API streaming and completion integration `crates/higgs/src/routes/chat.rs`	Builds `ToolSchema` from request tools for non-streaming/streaming, passes tools into prompt rendering, initializes StreamingToolCallTracker for SSE, routes visible tokens through tracker to emit `ToolCallDelta` and `content` deltas, drains tracker at stream end to flush buffered/partial tool outputs and override `finish_reason` when tool calls produced, and normalizes tool-call JSON before templating.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ChatHandler
  participant PromptBuilder
  participant ToolTracker
  participant SSE
  Client->>ChatHandler: request with tools
  ChatHandler->>ToolTracker: build schema from tools
  PromptBuilder->>PromptBuilder: render prompt with tool definitions
  ChatHandler->>ChatHandler: start model streaming
  loop for each chunk
    ChatHandler->>ToolTracker: feed chunk visible text
    ToolTracker->>ToolTracker: extract tool-call blocks
    ToolTracker->>SSE: emit ToolCallDelta for each tool call
    ToolTracker->>SSE: emit ContentDelta for remaining visible text
  end
  ChatHandler->>ToolTracker: drain at end
  ToolTracker->>SSE: flush buffered tool-call deltas
  ToolTracker->>ChatHandler: report tool_calls_present
  ChatHandler->>SSE: override finish_reason to tool_calls

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

MiniCPM5-1B loads but is unusable on higgs — incoherent output + dropped tool calls (4 root causes) #178 — Changes touch head_dim handling, tojson kwargs, decode_token behavior, and tool_parser extensions that align with the issue's objectives.

Suggested labels

risk: high

Poem

🐰 I nibbled XML, JSON, MiniCPM threads,
Hoisted function fields and parsed oddheads.
I kept the markup that makes tools sing,
Filtered the control tokens, let content spring.
A happy little rabbit, streaming joy ahead!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main changes: enabling MiniCPM5-1B support through explicit head_dim handling, special-token decode logic, tojson kwarg support, and tool parser improvements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch dusterbloom/minicpm5-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

dusterbloom · 2026-06-02T12:47:23Z

📋 Motivation & root-cause writeup (why each of the 4 changes was needed, with evidence): #178

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/higgs/src/routes/chat.rs (1)

270-279: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize empty tools the same way in both response modes.

At Line 319, non-streaming treats tools: [] as tool-enabled (is_some()), while streaming only enables tool parsing for non-empty lists. The same request can therefore yield parsed tool_calls / finish_reason = "tool_calls" in non-streaming but plain text in streaming.

Suggested fix

-    let tools = req.tools.as_deref();
+    let tools = req.tools.as_deref().filter(|t| !t.is_empty());
@@
-    let has_tools = req.tools.is_some();
+    let has_tools = tools.is_some();

-    let tool_schema = higgs_engine::tool_parser::ToolSchema::from_tools(req.tools.as_deref());
+    let prompt_tools = req.tools.as_deref().filter(|t| !t.is_empty());
+    let tool_schema = higgs_engine::tool_parser::ToolSchema::from_tools(prompt_tools);
@@
-    let prompt_tools = req
-        .tools
-        .as_deref()
-        .and_then(|t| if t.is_empty() { None } else { Some(t) });
     let mut prompt_tokens = engine
         .prepare_chat_prompt_with_thinking(&messages, prompt_tools, thinking_enabled_stream)

Also applies to: 319-352, 421-425, 461-466

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/higgs/src/routes/chat.rs` around lines 270 - 279, The code treats an
empty tools list differently between streaming and non-streaming paths because
tools is taken as Some(empty) in one path but not the other; normalize by
converting req.tools.as_deref() into None for empty slices before any downstream
checks. Replace uses of let tools = req.tools.as_deref() with a normalized
option (e.g., filter out empty slices) so downstream logic like
prepare_chat_prompt_with_thinking(..., tools, ...) and any is_some() checks that
enable tool parsing behave identically in both streaming and non-streaming
branches; apply the same normalization in all places where req.tools.as_deref()
or tool existence is checked (including the streaming parsing/finish_reason
logic and non-streaming response handling).

🧹 Nitpick comments (1)

crates/higgs-engine/src/tool_parser.rs (1)

497-500: ⚡ Quick win

Document the new public fields.

StreamingToolOutput is newly public, but visible and new_tool_calls do not have field-level docs.

Suggested fix

 pub struct StreamingToolOutput {
+    /// Text that should be forwarded to the client as normal assistant content.
     pub visible: String,
+    /// Tool calls that became complete while processing the current chunk.
     pub new_tool_calls: Vec<ParsedToolCall>,
 }

As per coding guidelines, "Add doc comments on public structs/fields in Rust when changing user-facing behavior".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/higgs-engine/src/tool_parser.rs` around lines 497 - 500,
StreamingToolOutput is now public but its public fields lack documentation; add
doc comments for the struct and for each public field—describe the purpose of
StreamingToolOutput and then annotate visible (String) to explain what the
string contains/represents (e.g., aggregated or incremental visible output to
the user) and new_tool_calls (Vec<ParsedToolCall>) to explain it contains parsed
tool calls discovered during streaming and their semantics/order; update the doc
comments above the pub struct StreamingToolOutput and above pub visible and pub
new_tool_calls accordingly so consumers understand expected contents and
behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/higgs-engine/src/chat_template.rs`:
- Around line 186-192: The doc comment describing argument parsing is out of
date: normalize_arguments_value() now coerces any parsed non-object (including
strings that parse to non-object JSON or unparseable strings) to an empty object
`{}`; update the public-facing comments that currently claim unparseable strings
are left untouched to instead state that after normalization non-object results
are converted to `{}`. Edit the doc block around the earlier description (and
the similar block later in the file) to mention normalize_arguments_value()'s
coercion behavior and the new contract so callers do not expect the original
string to survive normalization.

In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 409-420: parse_minicpm_function currently searches for NAME_ATTR
across the whole block which can match attributes in nested tags; restrict the
search to the opening <function ...> tag only by first locating the opener's end
('>') from the start of block and then searching for NAME_ATTR only within that
slice. In other words, find the index of the first '>' after the opening
'<function', take the substring representing the opener, then locate NAME_ATTR,
extract the quoted name from that opener substring (validating quotes and
non-empty), and fall back to returning None if not present or malformed; update
references in parse_minicpm_function to use this opener-limited search instead
of block.find(NAME_ATTR).
- Around line 249-252: The match arm conflates ParamType::Integer and
ParamType::Number by using parsed_if(Value::is_number), allowing floats for
integers; change the arms so ParamType::Integer uses a stricter check (e.g.,
replace parsed_if(Value::is_number) in the Integer arm with parsed_if(|v|
v.is_i64() || v.is_u64() || (v.is_f64() && v.as_f64().map(|f| f.fract() ==
0.0).unwrap_or(false)))) while keeping ParamType::Number using
parsed_if(Value::is_number); update the match cases around ParamType::Integer
and ParamType::Number in the code near function parsed_if / as_string to enforce
integer-only values.

---

Outside diff comments:
In `@crates/higgs/src/routes/chat.rs`:
- Around line 270-279: The code treats an empty tools list differently between
streaming and non-streaming paths because tools is taken as Some(empty) in one
path but not the other; normalize by converting req.tools.as_deref() into None
for empty slices before any downstream checks. Replace uses of let tools =
req.tools.as_deref() with a normalized option (e.g., filter out empty slices) so
downstream logic like prepare_chat_prompt_with_thinking(..., tools, ...) and any
is_some() checks that enable tool parsing behave identically in both streaming
and non-streaming branches; apply the same normalization in all places where
req.tools.as_deref() or tool existence is checked (including the streaming
parsing/finish_reason logic and non-streaming response handling).

---

Nitpick comments:
In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 497-500: StreamingToolOutput is now public but its public fields
lack documentation; add doc comments for the struct and for each public
field—describe the purpose of StreamingToolOutput and then annotate visible
(String) to explain what the string contains/represents (e.g., aggregated or
incremental visible output to the user) and new_tool_calls (Vec<ParsedToolCall>)
to explain it contains parsed tool calls discovered during streaming and their
semantics/order; update the doc comments above the pub struct
StreamingToolOutput and above pub visible and pub new_tool_calls accordingly so
consumers understand expected contents and behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff6757fb-ce21-4ad3-9203-4c19bd4f81c0

📥 Commits

Reviewing files that changed from the base of the PR and between f6e3c2f and 348f49a.

📒 Files selected for processing (5)

crates/higgs-engine/src/chat_template.rs
crates/higgs-engine/src/simple.rs
crates/higgs-engine/src/tool_parser.rs
crates/higgs-models/src/transformer.rs
crates/higgs/src/routes/chat.rs

`detect_thinking_support` only recognised `model_type` qwen3_5/qwen3_5_moe, so hybrid reasoning models with a different model_type ran with thinking OFF — e.g. MiniCPM5-1B (model_type=llama) answered without its reasoning channel: fast but inaccurate (bat-and-ball at temp 0.9 → $1.05/$1.60; with thinking on → correct $0.05 with self-check). Generalise the capability check to also honour the model author's own signal — the chat template referencing `enable_thinking` (covers Qwen3.5/3.6, MiniCPM5, future reasoning models) — keeping the qwen3_5* model_type as a fallback. The caller still requires a single-token `</think>`, so a stray mention can't enable a non-reasoning model. This is capability only; the per-request default (e.g. Qwen3.6 reasons off unless asked) stays in `model_defaults_to_non_thinking` and is unaffected — panbanda#150 behaviour preserved (reasoning tests green). MiniCPM5 isn't qwen3.6, so it defaults thinking on. Verified live: MiniCPM5 auto-enables thinking (no env var), emits reasoning, answers correctly, and tool calls still work alongside it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- coerce_param_value: split `integer` from `number` — `is_number` accepts floats, so an `integer` param wrongly coerced `3.14`. `integer` now only accepts i64/u64 values, falling back to the raw string otherwise. - chat (non-streaming): treat an empty `tools: []` as absent, mirroring the streaming path — it no longer defines `tools` in the template context or triggers tool parsing. - docs: refresh `normalize_tool_call_for_template` (non-object arguments are coerced to `{}`, not left untouched; dropped a stray leading line) and document the public `StreamingToolOutput` fields. Adds a test pinning the integer/fractional coercion contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

parse_minicpm_function scanned the whole block for `name="…"`, so a malformed payload like `<function ><param name="city">…` was parsed as a tool call named `city` instead of being preserved verbatim — routing bad model output into the tool-execution path. Read the attribute only from the opening `<function …>` tag (before its first `>`); a missing name now yields `None` (preserved). Adds a regression test, and backticks a few identifiers in test doc comments to satisfy `clippy::doc_markdown`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dusterbloom and others added 9 commits May 21, 2026 16:11

dusterbloom mentioned this pull request Jun 2, 2026

MiniCPM5-1B loads but is unusable on higgs — incoherent output + dropped tool calls (4 root causes) #178

Open

coderabbitai Bot requested changes Jun 2, 2026

View reviewed changes

Comment thread crates/higgs-engine/src/chat_template.rs Outdated

Comment thread crates/higgs-engine/src/tool_parser.rs

Comment thread crates/higgs-engine/src/tool_parser.rs Outdated

dusterbloom and others added 3 commits June 2, 2026 15:09

coderabbitai Bot approved these changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable MiniCPM5-1B (explicit head_dim + special-token decode + tojson kwarg + tool parser)#176

feat: enable MiniCPM5-1B (explicit head_dim + special-token decode + tojson kwarg + tool parser)#176
dusterbloom wants to merge 12 commits into
panbanda:mainfrom
dusterbloom:dusterbloom/minicpm5-support

dusterbloom commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Suggested labels

Poem

Uh oh!

dusterbloom commented Jun 2, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dusterbloom commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The 4 new commits

Verification (release builds)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Suggested labels

Poem

Uh oh!

dusterbloom commented Jun 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dusterbloom commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading