Skip to content

feat: enable MiniCPM5-1B (explicit head_dim + special-token decode + tojson kwarg + tool parser)#176

Open
dusterbloom wants to merge 12 commits into
panbanda:mainfrom
dusterbloom:dusterbloom/minicpm5-support
Open

feat: enable MiniCPM5-1B (explicit head_dim + special-token decode + tojson kwarg + tool parser)#176
dusterbloom wants to merge 12 commits into
panbanda:mainfrom
dusterbloom:dusterbloom/minicpm5-support

Conversation

@dusterbloom

@dusterbloom dusterbloom commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Takes openbmb/MiniCPM5-1B (MLX 4-bit) from loads-but-emits-garbage to fully working chat and tool-calling on higgs. Four self-contained commits — three are general fixes that help any model, one adds the MiniCPM tool-call format.

Stacked on #164 (streaming tool-call deltas + Qwen XML). The first 5 commits here belong to #164 — please review those there. This PR adds the 4 commits listed below; once #164 merges, only those remain in this diff.

The 4 new commits

  1. feat(models): honor explicit head_dimModelArgs::head_dim() always computed hidden_size / num_attention_heads, ignoring an explicit head_dim in config.json. MiniCPM5 sets head_dim=128 while 1536/16=96, so it loaded but produced garbage (wrong RoPE dimension + 1/√head_dim scale). Now honors the config value. Regression-safe: every other Llama-family config omits head_dimNone → behaviour unchanged. General fix for any explicit-head_dim checkpoint.

  2. fix(engine): preserve content special tokens on decodedecode_tokens used skip_special_tokens=true, deleting content-bearing special tokens. MiniCPM5 encodes its tool-call markup (<function>/<param>, token ids 18–21) as special tokens, so calls silently vanished into empty content. Now decodes with specials kept and strips only a precomputed control set (EOS ids + <|…|> delimiters + <s>/</s>/…). Plain text has no special tokens and decodes identically, so normal responses are unaffected. Also fixes latent control-token leakage for other models.

  3. fix(chat_template): accept ensure_ascii kwarg in tojson — HF templates call tojson(ensure_ascii=false) (MiniCPM5 at chat:6); the custom filter rejected kwargs and 500'd any tools-bearing request. Now accepts (and ignores) them — serde_json::to_string already emits UTF-8, i.e. ensure_ascii=false. Templates calling tojson with no args are unaffected.

  4. feat(tool_parser): MiniCPM5 tool format — a third tool-call format (attribute-named, optional <![CDATA[…]]> values, no <tool_call> wrapper), dispatched on content shape. CDATA-aware end detection (a literal </function> inside a value can't close the block early); the streaming tracker's flag becomes an Inside { None, ToolCall, Function } enum (the existing arms unchanged). Reuses the ToolSchema coercion from feat(chat): streaming tool-call deltas + Qwen-friendly arg normalisation #164. 6 new unit tests.

Verification (release builds)

  • cargo test --release — higgs-engine 258, higgs-models 457, higgs 98; 0 failed. cargo clippy/cargo fmt clean across touched crates.
  • Live e2e on mlx-community/MiniCPM5-1B-4bit:
    • plain chat coherent ("Paris is the capital of France." — was "I cannot answer: 1." before the head_dim fix)
    • tool call → {"name":"get_weather","arguments":{"city":"London"}}, finish_reason: tool_calls, no markup leak — non-streaming and streaming.
  • Regression on Qwen3.5-4B-MLX-4bit: tool calls still parse to structured tool_calls; the decode change also removes a stray <|im_end|> that could leak into content.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Expanded tool-call parsing to accept additional tool formats and emit structured tool-call events during streaming
    • Tool-call values are now normalized for templates so rendered tool metadata is consistent
    • Thinking-mode detection improved to enable model "thinking" behavior from templates or model config
  • Bug Fixes

    • Token decoding preserves visible tool/markup tokens while removing system markers and delimiters
    • Streaming parsing now recovers from unclosed/oversized tool blocks instead of hanging
  • Improvements

    • Added explicit head_dim configuration for model parameters

dusterbloom and others added 9 commits May 21, 2026 16:11
The streaming `/v1/chat/completions` route previously stripped tools from
the prompt (`prepare_chat_prompt_with_thinking(..., None, ...)`) and warned
that "tool_calls deltas are unsupported". The model, deprived of tool
context, would hallucinate fake tool calls as plain text — and the client's
TTS pipeline would happily speak `<nanobot>read_skill newsreader</nanobot>`
out loud. This commit completes the streaming tool-call story that was
documented but never finished (see closed upstream PR panbanda#63).

Three pieces, ~580 net new lines:

1. `tool_parser::StreamingToolCallTracker`
   New `pub struct` next to the existing `parse_tool_calls`. A small state
   machine that buffers streamed text chunks and extracts complete
   `<tool_call>{json}</tool_call>` blocks on the fly. When `active=false`
   (no tools in the request) it collapses to a single-allocation
   passthrough with zero parsing cost.

   Invariants verified by 8 new unit tests:
   - inactive=false → pure passthrough
   - single complete tag in one chunk
   - tag split across N chunks → reassembled
   - text before AND after tags → both visible
   - invalid JSON inside tag → preserved as visible (no silent loss)
   - unclosed tag at flush → buffered prefix emitted as visible
   - UTF-8 char-boundary safety on tail flush
   - multiple calls with text between → indices tracked correctly

2. `chat::chat_completions_stream` wiring
   - Pass `req.tools.as_deref()` (not None) to
     `prepare_chat_prompt_with_thinking` so the chat template renders the
     tool spec the model recognises.
   - Construct a `StreamingToolCallTracker` keyed off `stream_includes_tools`.
   - On every chunk: route the reasoning-tracker's visible text through
     the tool tracker; emit `ToolCallDelta` SSE events for each completed
     call; emit content delta for the surviving visible text.
   - Defer `finish_reason` until after the tool tracker has drained, so
     we report `"tool_calls"` when the response actually contained any.
   - Final flush drains both trackers so no tokens vanish silently.

3. `chat_template::normalize_tool_call_for_template`
   New `pub fn` that walks a tool-call JSON value and (a) hoists
   `function.{name,arguments}` to the top level — Qwen's
   `chat_template.jinja` references `tool_call.name` and
   `tool_call.arguments` directly, not the OpenAI-nested shape; (b) parses
   string-encoded `arguments` to a JSON value, fixing the
   `cannot convert value into pairs (in chat:120)` minijinja error that
   crashed multi-turn conversations carrying assistant tool_calls in
   their history.

   Verified by 4 new unit tests covering OpenAI shape, Qwen-flat shape
   (no-op), unparseable-string arguments (kept as string), and non-object
   inputs (no-op).

`convert_messages` in the chat route calls the normaliser per tool call so
both the streaming and non-streaming paths get the fix.

Test impact:
- higgs-models: 242 passed (no change)
- higgs-engine: 338 passed (+27 tracker tests, +4 normalize tests, 25 ignored)
- higgs:        457 passed (no regression on existing convert_messages tests)
- cargo clippy -Dwarnings: clean across all three crates
…bind

Qwen's chat_template.jinja:107-108 rebinds the loop variable:

    {%- for tool_call in message.tool_calls %}
        {%- if tool_call.function is defined %}
            {%- set tool_call = tool_call.function %}
        {%- endif %}

After this rebind, line 120's `tool_call.arguments|items` walks into
the ORIGINAL `function.arguments` JSON-encoded string — bypassing the
top-level `arguments` we just normalised. minijinja raises "cannot
convert value into pairs" and the request 500s.

Two-part fix:

1. `normalize_tool_call_for_template` now normalises BOTH the hoisted
   top-level `arguments` AND the still-nested `function.arguments`.
   Pulled the parse-and-coerce logic into a private helper
   `normalize_arguments_value` so it's applied identically at both
   sites (DRY).

2. The coercion now defends against every non-mapping shape — null,
   bool, number, array, string-that-fails-to-parse — by replacing the
   value with an empty `{}`. A warn is logged so pathological shapes
   stay visible. Previously only string-parsing was attempted and any
   other shape leaked through to the template.

New invariant test (`normalize_handles_qwen_rebind_to_function`) pins
both `tool_call.arguments` AND `tool_call.function.arguments` to be
mappings after normalize, so future edits can't regress the rebind
path silently. Five additional coercion tests cover null / array /
number / unparseable-string / array-as-string cases.

Verified by manual smoke: `nanobot agent -l -m "Use your tools to list
files in /tmp"` previously 500'd with the template error; now returns
a real directory listing streamed back through `StreamingToolCallTracker`.

Test count: chat_template 36 (was 31, +5 net for the new shapes), all
green. Clippy `-Dwarnings` clean across higgs / higgs-engine / higgs-models.

Also: cargo fmt sweep on the new code that landed unformatted in the
prior commit; pure layout, no behaviour change.
`req.tools.as_deref()` returns `Some(&[])` when the request carries an
empty `tools` array. The chat-template renderer treats that as `tools`
being *defined* in the Jinja context (just empty), versus `None` which
omits the key entirely. Templates branching on `{% if tools is defined %}`
vs `{% if tools %}` see different state.

Convert empty slices to `None` before handing them to
`prepare_chat_prompt_with_thinking`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses CRITICAL finding from the closed upstream PR panbanda#63 review: a
model that emits `<tool_call>` and never produces a matching
`</tool_call>` would grow `StreamingToolCallTracker::buffer` until OOM
— a model-controlled DoS vector.

On overflow we abandon the parse, surface `<tool_call>` plus the
buffered bytes as visible content (preserving the "never silently drop
tokens" invariant), and reset state so a subsequent well-formed tool
call in the same stream still parses.

New test `streaming_unbounded_buffer_capped_and_recovers` pushes
MAX_INSIDE_TOOL_CALL_BYTES + 1 inside an unclosed tag followed by a
valid call, and asserts both that the overflow is flushed verbatim and
that the post-overflow call still parses (completed_count == 1).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… coercion

Qwen3.5/3.6 models' chat_template.jinja instructs them to emit
`<tool_call><function=NAME><parameter=KEY>value</parameter></function></tool_call>`,
but the existing parser only decodes the older JSON-in-`<tool_call>` shape, so
tool calls leaked into `content` with `finish_reason: stop` and the OpenAI
streaming/non-streaming clients saw nothing.

Add a second parse path that dispatches on the block's first token
(`<function=` → XML, else → JSON), reusing the existing
`StreamingToolCallTracker` boundary detection unchanged. XML values are raw
strings, so coerce them through a `ToolSchema` built from the request's
`tools` array (integer/number/boolean/object/array → typed JSON,
string-typed `"123"` stays a string), falling back to best-effort JSON
parsing when no schema is declared.

- `parse_tool_calls` and `StreamingToolCallTracker::new` now take an
  `Option<&ToolSchema>` / `Option<ToolSchema>`; the streaming path captures
  it from `req.tools` before the `async_stream!` block, the non-streaming
  path threads it at the call site
- Backward-compatible: pre-existing 28 JSON tests still pass; one new test
  exercises a mixed JSON+XML stream

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`ModelArgs::head_dim()` always computed `hidden_size / num_attention_heads`,
ignoring an explicit `head_dim` in config.json. Models that set a head_dim
differing from that ratio (e.g. MiniCPM5-1B: head_dim=128 while 1536/16=96)
then loaded but produced garbage — RoPE, the 1/sqrt(head_dim) scale, and head
reshaping all used the wrong dimension.

Add `head_dim_override: Option<i32>` (serde `head_dim`); `head_dim()` and
`checked_head_dim()` return it when present, else the existing ratio.
Regression-safe: every other Llama-family config omits `head_dim`, so the
field is None and behavior is unchanged. Verified live — MiniCPM5 went from
"I cannot answer: 1." to "Paris is the capital of France."

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
decode_tokens used `skip_special_tokens=true`, which strips ALL special
tokens — including content-bearing ones. Models that encode tool-call markup
as special tokens (e.g. MiniCPM5: `<function>`/`<param>` are token ids 18-21)
had the structure deleted before the tool parser could see it, so tool calls
silently vanished into empty content.

Decode with `skip_special_tokens=false` and instead strip a precomputed set of
control tokens: the EOS ids plus the `<|…|>` chat-control delimiters and the
classic sentinels (`<s>`, `</s>`, `<unk>`, …). Content tokens like `<think>`,
`<tool_call>`, `<function>` are preserved. Plain text has no special tokens and
decodes identically, so normal responses are unaffected.

Verified: MiniCPM5 tool calls now surface as structured tool_calls; Qwen3.5
regression-clean (its `<|im_end|>` no longer leaks into content).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The custom `tojson` filter took only the piped value, so HF templates that
call `tojson(ensure_ascii=false)` (e.g. MiniCPM5 at chat:6) failed prompt
rendering with "too many arguments" and 500'd any tools-bearing request.

Accept (and ignore) trailing kwargs via `minijinja::value::Kwargs`.
`serde_json::to_string` already emits UTF-8, matching ensure_ascii=false, so
the value is a no-op. Templates calling `tojson` with no args are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MiniCPM5 emits tool calls as `<function name="NAME"><param name="KEY">VALUE
</param></function>` — no `<tool_call>` wrapper, attribute-named, with optional
`<![CDATA[…]]>` values. Neither the JSON nor the Qwen `<function=` XML parser
handled it, so calls leaked into content.

Add a third parse path, dispatched on content shape (a top-level `<function `
opener vs the existing `<tool_call>` wrapper):
- `parse_minicpm_function` extracts the name attr and `<param>` entries, with
  CDATA-aware value extraction; `minicpm_function_end` finds the closing
  `</function>` while skipping CDATA spans (so a literal `</function>` inside a
  value can't end the block early).
- The streaming tracker's `inside_tool_call: bool` becomes
  `enum Inside { None, ToolCall, Function }`; the ToolCall arm is unchanged.
- Reuses ToolSchema/coerce_param_value for typed argument coercion.

6 new tests (single, multi, CDATA-with-literal-</function>, schema coercion,
no-param, streaming split mid-opener+mid-CDATA); all existing JSON/Qwen tests
still pass. Verified live e2e on MiniCPM5-1B (non-streaming + streaming).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3b007317-a333-4900-b24f-fbfa2edee76d

📥 Commits

Reviewing files that changed from the base of the PR and between 348f49a and 9d90be4.

📒 Files selected for processing (4)
  • crates/higgs-engine/src/chat_template.rs
  • crates/higgs-engine/src/simple.rs
  • crates/higgs-engine/src/tool_parser.rs
  • crates/higgs/src/routes/chat.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/higgs-engine/src/chat_template.rs
  • crates/higgs-engine/src/tool_parser.rs

📝 Walkthrough

Walkthrough

This PR adds multi-shape tool-call parsing with schema-driven coercion and streaming safety, normalizes tool-call JSON for templates, preserves content-bearing markup while filtering control/sentinel tokens during decoding, adds template-aware tojson kwargs, expands thinking-detection heuristics, adds an explicit head_dim override, and integrates these across streaming and non-streaming chat handlers.

Changes

Tool-call ecosystem and chat integration

Layer / File(s) Summary
Tool-call parsing core and schema
crates/higgs-engine/src/tool_parser.rs, tests/tool_parser/*
Module docs, parse dispatch, XML parsing/coercion, ParamType/ToolSchema and helpers; parse_tool_calls updated to accept optional schema and dispatch by format.
MiniCPM parsing & streaming-safe extraction
crates/higgs-engine/src/tool_parser.rs, tests/tool_parser/*
MiniCPM bare-function parser with CDATA support; StreamingToolCallTracker extended to handle multiple shapes, MAX_INSIDE_TOOL_CALL_BYTES overflow handling, and streaming reassembly tests.
Template normalization & tojson kwargs
crates/higgs-engine/src/chat_template.rs, tests/chat_template/*
Adds normalize_tool_call_for_template to hoist function.{name,arguments} and parse/coerce string-encoded arguments to objects (fallback {}); tojson filter accepts Minijinja Kwargs (e.g., ensure_ascii=false) while retaining serde_json serialization.
Selective token decoding with control-token filtering
crates/higgs-engine/src/simple.rs
Adds decode_skip_ids built from EOS and tokenizer control/sentinel tokens; decodes without token-skip mode and filters skip IDs so content-bearing markup remains visible, while control/sentinel tokens are removed.
Thinking detection and template-marker helper
crates/higgs-engine/src/simple.rs, tests/*
detect_thinking_support now considers enable_thinking markers found in chat templates or tokenizer_config.json via chat_template_mentions_enable_thinking; tests added for marker-based, model-type, and negative detection.
Model configuration with explicit head dimension
crates/higgs-models/src/transformer.rs, tests/transformer/*
Adds head_dim_override (head_dim in config) to ModelArgs; head_dim() and checked_head_dim() honor the override and validate positivity; tests verify override and fallback behavior.
Chat API streaming and completion integration
crates/higgs/src/routes/chat.rs
Builds ToolSchema from request tools for non-streaming/streaming, passes tools into prompt rendering, initializes StreamingToolCallTracker for SSE, routes visible tokens through tracker to emit ToolCallDelta and content deltas, drains tracker at stream end to flush buffered/partial tool outputs and override finish_reason when tool calls produced, and normalizes tool-call JSON before templating.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ChatHandler
  participant PromptBuilder
  participant ToolTracker
  participant SSE
  Client->>ChatHandler: request with tools
  ChatHandler->>ToolTracker: build schema from tools
  PromptBuilder->>PromptBuilder: render prompt with tool definitions
  ChatHandler->>ChatHandler: start model streaming
  loop for each chunk
    ChatHandler->>ToolTracker: feed chunk visible text
    ToolTracker->>ToolTracker: extract tool-call blocks
    ToolTracker->>SSE: emit ToolCallDelta for each tool call
    ToolTracker->>SSE: emit ContentDelta for remaining visible text
  end
  ChatHandler->>ToolTracker: drain at end
  ToolTracker->>SSE: flush buffered tool-call deltas
  ToolTracker->>ChatHandler: report tool_calls_present
  ChatHandler->>SSE: override finish_reason to tool_calls
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Suggested labels

risk: high

Poem

🐰 I nibbled XML, JSON, MiniCPM threads,
Hoisted function fields and parsed oddheads.
I kept the markup that makes tools sing,
Filtered the control tokens, let content spring.
A happy little rabbit, streaming joy ahead!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main changes: enabling MiniCPM5-1B support through explicit head_dim handling, special-token decode logic, tojson kwarg support, and tool parser improvements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch dusterbloom/minicpm5-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dusterbloom

Copy link
Copy Markdown
Contributor Author

📋 Motivation & root-cause writeup (why each of the 4 changes was needed, with evidence): #178

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/higgs/src/routes/chat.rs (1)

270-279: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize empty tools the same way in both response modes.

At Line 319, non-streaming treats tools: [] as tool-enabled (is_some()), while streaming only enables tool parsing for non-empty lists. The same request can therefore yield parsed tool_calls / finish_reason = "tool_calls" in non-streaming but plain text in streaming.

Suggested fix
-    let tools = req.tools.as_deref();
+    let tools = req.tools.as_deref().filter(|t| !t.is_empty());
@@
-    let has_tools = req.tools.is_some();
+    let has_tools = tools.is_some();
-    let tool_schema = higgs_engine::tool_parser::ToolSchema::from_tools(req.tools.as_deref());
+    let prompt_tools = req.tools.as_deref().filter(|t| !t.is_empty());
+    let tool_schema = higgs_engine::tool_parser::ToolSchema::from_tools(prompt_tools);
@@
-    let prompt_tools = req
-        .tools
-        .as_deref()
-        .and_then(|t| if t.is_empty() { None } else { Some(t) });
     let mut prompt_tokens = engine
         .prepare_chat_prompt_with_thinking(&messages, prompt_tools, thinking_enabled_stream)

Also applies to: 319-352, 421-425, 461-466

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/higgs/src/routes/chat.rs` around lines 270 - 279, The code treats an
empty tools list differently between streaming and non-streaming paths because
tools is taken as Some(empty) in one path but not the other; normalize by
converting req.tools.as_deref() into None for empty slices before any downstream
checks. Replace uses of let tools = req.tools.as_deref() with a normalized
option (e.g., filter out empty slices) so downstream logic like
prepare_chat_prompt_with_thinking(..., tools, ...) and any is_some() checks that
enable tool parsing behave identically in both streaming and non-streaming
branches; apply the same normalization in all places where req.tools.as_deref()
or tool existence is checked (including the streaming parsing/finish_reason
logic and non-streaming response handling).
🧹 Nitpick comments (1)
crates/higgs-engine/src/tool_parser.rs (1)

497-500: ⚡ Quick win

Document the new public fields.

StreamingToolOutput is newly public, but visible and new_tool_calls do not have field-level docs.

Suggested fix
 pub struct StreamingToolOutput {
+    /// Text that should be forwarded to the client as normal assistant content.
     pub visible: String,
+    /// Tool calls that became complete while processing the current chunk.
     pub new_tool_calls: Vec<ParsedToolCall>,
 }

As per coding guidelines, "Add doc comments on public structs/fields in Rust when changing user-facing behavior".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/higgs-engine/src/tool_parser.rs` around lines 497 - 500,
StreamingToolOutput is now public but its public fields lack documentation; add
doc comments for the struct and for each public field—describe the purpose of
StreamingToolOutput and then annotate visible (String) to explain what the
string contains/represents (e.g., aggregated or incremental visible output to
the user) and new_tool_calls (Vec<ParsedToolCall>) to explain it contains parsed
tool calls discovered during streaming and their semantics/order; update the doc
comments above the pub struct StreamingToolOutput and above pub visible and pub
new_tool_calls accordingly so consumers understand expected contents and
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/higgs-engine/src/chat_template.rs`:
- Around line 186-192: The doc comment describing argument parsing is out of
date: normalize_arguments_value() now coerces any parsed non-object (including
strings that parse to non-object JSON or unparseable strings) to an empty object
`{}`; update the public-facing comments that currently claim unparseable strings
are left untouched to instead state that after normalization non-object results
are converted to `{}`. Edit the doc block around the earlier description (and
the similar block later in the file) to mention normalize_arguments_value()'s
coercion behavior and the new contract so callers do not expect the original
string to survive normalization.

In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 409-420: parse_minicpm_function currently searches for NAME_ATTR
across the whole block which can match attributes in nested tags; restrict the
search to the opening <function ...> tag only by first locating the opener's end
('>') from the start of block and then searching for NAME_ATTR only within that
slice. In other words, find the index of the first '>' after the opening
'<function', take the substring representing the opener, then locate NAME_ATTR,
extract the quoted name from that opener substring (validating quotes and
non-empty), and fall back to returning None if not present or malformed; update
references in parse_minicpm_function to use this opener-limited search instead
of block.find(NAME_ATTR).
- Around line 249-252: The match arm conflates ParamType::Integer and
ParamType::Number by using parsed_if(Value::is_number), allowing floats for
integers; change the arms so ParamType::Integer uses a stricter check (e.g.,
replace parsed_if(Value::is_number) in the Integer arm with parsed_if(|v|
v.is_i64() || v.is_u64() || (v.is_f64() && v.as_f64().map(|f| f.fract() ==
0.0).unwrap_or(false)))) while keeping ParamType::Number using
parsed_if(Value::is_number); update the match cases around ParamType::Integer
and ParamType::Number in the code near function parsed_if / as_string to enforce
integer-only values.

---

Outside diff comments:
In `@crates/higgs/src/routes/chat.rs`:
- Around line 270-279: The code treats an empty tools list differently between
streaming and non-streaming paths because tools is taken as Some(empty) in one
path but not the other; normalize by converting req.tools.as_deref() into None
for empty slices before any downstream checks. Replace uses of let tools =
req.tools.as_deref() with a normalized option (e.g., filter out empty slices) so
downstream logic like prepare_chat_prompt_with_thinking(..., tools, ...) and any
is_some() checks that enable tool parsing behave identically in both streaming
and non-streaming branches; apply the same normalization in all places where
req.tools.as_deref() or tool existence is checked (including the streaming
parsing/finish_reason logic and non-streaming response handling).

---

Nitpick comments:
In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 497-500: StreamingToolOutput is now public but its public fields
lack documentation; add doc comments for the struct and for each public
field—describe the purpose of StreamingToolOutput and then annotate visible
(String) to explain what the string contains/represents (e.g., aggregated or
incremental visible output to the user) and new_tool_calls (Vec<ParsedToolCall>)
to explain it contains parsed tool calls discovered during streaming and their
semantics/order; update the doc comments above the pub struct
StreamingToolOutput and above pub visible and pub new_tool_calls accordingly so
consumers understand expected contents and behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff6757fb-ce21-4ad3-9203-4c19bd4f81c0

📥 Commits

Reviewing files that changed from the base of the PR and between f6e3c2f and 348f49a.

📒 Files selected for processing (5)
  • crates/higgs-engine/src/chat_template.rs
  • crates/higgs-engine/src/simple.rs
  • crates/higgs-engine/src/tool_parser.rs
  • crates/higgs-models/src/transformer.rs
  • crates/higgs/src/routes/chat.rs

Comment thread crates/higgs-engine/src/chat_template.rs Outdated
Comment thread crates/higgs-engine/src/tool_parser.rs
Comment thread crates/higgs-engine/src/tool_parser.rs Outdated
dusterbloom and others added 3 commits June 2, 2026 15:09
`detect_thinking_support` only recognised `model_type` qwen3_5/qwen3_5_moe, so
hybrid reasoning models with a different model_type ran with thinking OFF —
e.g. MiniCPM5-1B (model_type=llama) answered without its reasoning channel:
fast but inaccurate (bat-and-ball at temp 0.9 → $1.05/$1.60; with thinking on
→ correct $0.05 with self-check).

Generalise the capability check to also honour the model author's own signal —
the chat template referencing `enable_thinking` (covers Qwen3.5/3.6, MiniCPM5,
future reasoning models) — keeping the qwen3_5* model_type as a fallback. The
caller still requires a single-token `</think>`, so a stray mention can't
enable a non-reasoning model.

This is capability only; the per-request default (e.g. Qwen3.6 reasons off
unless asked) stays in `model_defaults_to_non_thinking` and is unaffected —
panbanda#150 behaviour preserved (reasoning tests green). MiniCPM5 isn't qwen3.6, so it
defaults thinking on.

Verified live: MiniCPM5 auto-enables thinking (no env var), emits reasoning,
answers correctly, and tool calls still work alongside it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- coerce_param_value: split `integer` from `number` — `is_number` accepts
  floats, so an `integer` param wrongly coerced `3.14`. `integer` now only
  accepts i64/u64 values, falling back to the raw string otherwise.
- chat (non-streaming): treat an empty `tools: []` as absent, mirroring the
  streaming path — it no longer defines `tools` in the template context or
  triggers tool parsing.
- docs: refresh `normalize_tool_call_for_template` (non-object arguments are
  coerced to `{}`, not left untouched; dropped a stray leading line) and
  document the public `StreamingToolOutput` fields.

Adds a test pinning the integer/fractional coercion contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
parse_minicpm_function scanned the whole block for `name="…"`, so a malformed
payload like `<function ><param name="city">…` was parsed as a tool call named
`city` instead of being preserved verbatim — routing bad model output into the
tool-execution path. Read the attribute only from the opening `<function …>`
tag (before its first `>`); a missing name now yields `None` (preserved).

Adds a regression test, and backticks a few identifiers in test doc comments
to satisfy `clippy::doc_markdown`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant