feat(chat): streaming tool-call deltas + Qwen-friendly arg normalisation#164
feat(chat): streaming tool-call deltas + Qwen-friendly arg normalisation#164dusterbloom wants to merge 7 commits into
Conversation
The streaming `/v1/chat/completions` route previously stripped tools from the prompt (`prepare_chat_prompt_with_thinking(..., None, ...)`) and warned that "tool_calls deltas are unsupported". The model, deprived of tool context, would hallucinate fake tool calls as plain text — and the client's TTS pipeline would happily speak `<nanobot>read_skill newsreader</nanobot>` out loud. This commit completes the streaming tool-call story that was documented but never finished (see closed upstream PR panbanda#63). Three pieces, ~580 net new lines: 1. `tool_parser::StreamingToolCallTracker` New `pub struct` next to the existing `parse_tool_calls`. A small state machine that buffers streamed text chunks and extracts complete `<tool_call>{json}</tool_call>` blocks on the fly. When `active=false` (no tools in the request) it collapses to a single-allocation passthrough with zero parsing cost. Invariants verified by 8 new unit tests: - inactive=false → pure passthrough - single complete tag in one chunk - tag split across N chunks → reassembled - text before AND after tags → both visible - invalid JSON inside tag → preserved as visible (no silent loss) - unclosed tag at flush → buffered prefix emitted as visible - UTF-8 char-boundary safety on tail flush - multiple calls with text between → indices tracked correctly 2. `chat::chat_completions_stream` wiring - Pass `req.tools.as_deref()` (not None) to `prepare_chat_prompt_with_thinking` so the chat template renders the tool spec the model recognises. - Construct a `StreamingToolCallTracker` keyed off `stream_includes_tools`. - On every chunk: route the reasoning-tracker's visible text through the tool tracker; emit `ToolCallDelta` SSE events for each completed call; emit content delta for the surviving visible text. - Defer `finish_reason` until after the tool tracker has drained, so we report `"tool_calls"` when the response actually contained any. - Final flush drains both trackers so no tokens vanish silently. 3. `chat_template::normalize_tool_call_for_template` New `pub fn` that walks a tool-call JSON value and (a) hoists `function.{name,arguments}` to the top level — Qwen's `chat_template.jinja` references `tool_call.name` and `tool_call.arguments` directly, not the OpenAI-nested shape; (b) parses string-encoded `arguments` to a JSON value, fixing the `cannot convert value into pairs (in chat:120)` minijinja error that crashed multi-turn conversations carrying assistant tool_calls in their history. Verified by 4 new unit tests covering OpenAI shape, Qwen-flat shape (no-op), unparseable-string arguments (kept as string), and non-object inputs (no-op). `convert_messages` in the chat route calls the normaliser per tool call so both the streaming and non-streaming paths get the fix. Test impact: - higgs-models: 242 passed (no change) - higgs-engine: 338 passed (+27 tracker tests, +4 normalize tests, 25 ignored) - higgs: 457 passed (no regression on existing convert_messages tests) - cargo clippy -Dwarnings: clean across all three crates
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds Hermes XML and JSON tool-call parsing, a StreamingToolCallTracker to buffer and extract <tool_call> blocks across chunks, template normalization for tool-call JSON, and integration into the chat streaming pipeline to emit structured ToolCallDelta SSE events and defer finish_reason when tool calls occur. ChangesStreaming Tool-Call Support
Sequence DiagramsequenceDiagram
participant ChatAPI
participant StreamGenerator
participant ReasoningTracker
participant ToolTracker
participant SSEOutput
ChatAPI->>StreamGenerator: start streaming with tools
StreamGenerator->>ReasoningTracker: produce chunk (reasoning, visible)
ReasoningTracker->>ToolTracker: visible text
ToolTracker->>ToolTracker: extract <tool_call> blocks across chunks
ToolTracker->>SSEOutput: emit ToolCallDelta for each completed call
ToolTracker->>SSEOutput: emit content with remaining visible text
StreamGenerator->>ToolTracker: flush() on stream end
ToolTracker->>SSEOutput: emit buffered unclosed content and final finish_reason
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/higgs-engine/src/tool_parser.rs (1)
101-104: ⚡ Quick winDocument
StreamingToolOutputpublic fields.
StreamingToolOutput.visibleandStreamingToolOutput.new_tool_callsare public API fields and should have field-level rustdoc comments for discoverability.As per coding guidelines,
**/*.rs: Add doc comments on public structs/fields in Rust when changing user-facing behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/higgs-engine/src/tool_parser.rs` around lines 101 - 104, Add rustdoc comments for the public struct StreamingToolOutput and each of its public fields: document what StreamingToolOutput represents, and add brief field-level comments for visible (what the string contains and visibility/format expectations) and new_tool_calls (what ParsedToolCall entries represent and when they are populated), referencing StreamingToolOutput, visible, and new_tool_calls so consumers can discover and understand the API.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/higgs/src/routes/chat.rs`:
- Around line 456-459: The code passes Some(&[]) for an empty tools array which
still marks `tools` as defined in templates; change how `prompt_tools` is
derived so empty slices become None: replace `let prompt_tools =
req.tools.as_deref();` with logic that converts empty slices to None (e.g. `let
prompt_tools = req.tools.as_deref().and_then(|t| if t.is_empty() { None } else {
Some(t) });`) so that the call to
`engine.prepare_chat_prompt_with_thinking(&messages, prompt_tools,
thinking_enabled_stream)` receives None for absent/empty tools.
---
Nitpick comments:
In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 101-104: Add rustdoc comments for the public struct
StreamingToolOutput and each of its public fields: document what
StreamingToolOutput represents, and add brief field-level comments for visible
(what the string contains and visibility/format expectations) and new_tool_calls
(what ParsedToolCall entries represent and when they are populated), referencing
StreamingToolOutput, visible, and new_tool_calls so consumers can discover and
understand the API.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7a10d527-c5d2-481a-9a66-5870a5f99e83
📒 Files selected for processing (3)
crates/higgs-engine/src/chat_template.rscrates/higgs-engine/src/tool_parser.rscrates/higgs/src/routes/chat.rs
…bind
Qwen's chat_template.jinja:107-108 rebinds the loop variable:
{%- for tool_call in message.tool_calls %}
{%- if tool_call.function is defined %}
{%- set tool_call = tool_call.function %}
{%- endif %}
After this rebind, line 120's `tool_call.arguments|items` walks into
the ORIGINAL `function.arguments` JSON-encoded string — bypassing the
top-level `arguments` we just normalised. minijinja raises "cannot
convert value into pairs" and the request 500s.
Two-part fix:
1. `normalize_tool_call_for_template` now normalises BOTH the hoisted
top-level `arguments` AND the still-nested `function.arguments`.
Pulled the parse-and-coerce logic into a private helper
`normalize_arguments_value` so it's applied identically at both
sites (DRY).
2. The coercion now defends against every non-mapping shape — null,
bool, number, array, string-that-fails-to-parse — by replacing the
value with an empty `{}`. A warn is logged so pathological shapes
stay visible. Previously only string-parsing was attempted and any
other shape leaked through to the template.
New invariant test (`normalize_handles_qwen_rebind_to_function`) pins
both `tool_call.arguments` AND `tool_call.function.arguments` to be
mappings after normalize, so future edits can't regress the rebind
path silently. Five additional coercion tests cover null / array /
number / unparseable-string / array-as-string cases.
Verified by manual smoke: `nanobot agent -l -m "Use your tools to list
files in /tmp"` previously 500'd with the template error; now returns
a real directory listing streamed back through `StreamingToolCallTracker`.
Test count: chat_template 36 (was 31, +5 net for the new shapes), all
green. Clippy `-Dwarnings` clean across higgs / higgs-engine / higgs-models.
Also: cargo fmt sweep on the new code that landed unformatted in the
prior commit; pure layout, no behaviour change.
`req.tools.as_deref()` returns `Some(&[])` when the request carries an
empty `tools` array. The chat-template renderer treats that as `tools`
being *defined* in the Jinja context (just empty), versus `None` which
omits the key entirely. Templates branching on `{% if tools is defined %}`
vs `{% if tools %}` see different state.
Convert empty slices to `None` before handing them to
`prepare_chat_prompt_with_thinking`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses CRITICAL finding from the closed upstream PR panbanda#63 review: a model that emits `<tool_call>` and never produces a matching `</tool_call>` would grow `StreamingToolCallTracker::buffer` until OOM — a model-controlled DoS vector. On overflow we abandon the parse, surface `<tool_call>` plus the buffered bytes as visible content (preserving the "never silently drop tokens" invariant), and reset state so a subsequent well-formed tool call in the same stream still parses. New test `streaming_unbounded_buffer_capped_and_recovers` pushes MAX_INSIDE_TOOL_CALL_BYTES + 1 inside an unclosed tag followed by a valid call, and asserts both that the overflow is flushed verbatim and that the post-overflow call still parses (completed_count == 1). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
This PR fixes the exact issue where streaming API drops tool definitions and warns 'tool-calls are unsupported'. Running Higgs 1.3.0 with Qwen3-Coder-Next and OpenCode — every streaming request gets tools stripped, forcing a text-based fallback that's unreliable for complex tool calls. The non-streaming path does parse the model's XML output but doesn't convert it to OpenAI's structured All 6 CI checks are green, the implementation is surgical (3 files, +710 -17), and this unblocks local MLX models for use with agentic tools like OpenCode, Claude Code, and Aider. Would love to see this merged for the next release. 🙏 |
… coercion Qwen3.5/3.6 models' chat_template.jinja instructs them to emit `<tool_call><function=NAME><parameter=KEY>value</parameter></function></tool_call>`, but the existing parser only decodes the older JSON-in-`<tool_call>` shape, so tool calls leaked into `content` with `finish_reason: stop` and the OpenAI streaming/non-streaming clients saw nothing. Add a second parse path that dispatches on the block's first token (`<function=` → XML, else → JSON), reusing the existing `StreamingToolCallTracker` boundary detection unchanged. XML values are raw strings, so coerce them through a `ToolSchema` built from the request's `tools` array (integer/number/boolean/object/array → typed JSON, string-typed `"123"` stays a string), falling back to best-effort JSON parsing when no schema is declared. - `parse_tool_calls` and `StreamingToolCallTracker::new` now take an `Option<&ToolSchema>` / `Option<ToolSchema>`; the streaming path captures it from `req.tools` before the `async_stream!` block, the non-streaming path threads it at the call site - Backward-compatible: pre-existing 28 JSON tests still pass; one new test exercises a mixed JSON+XML stream Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/higgs-engine/src/tool_parser.rs (1)
340-344:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument the public
StreamingToolOutputfields.This struct is new public API, but
visibleandnew_tool_callsstill rely on the struct-level summary for their contract. Add field-level rustdoc so the generated API docs stay explicit.📚 Suggested doc shape
pub struct StreamingToolOutput { + /// Text that should be forwarded to the client as a normal content delta. pub visible: String, + /// Tool calls that became complete while processing this chunk. pub new_tool_calls: Vec<ParsedToolCall>, }As per coding guidelines,
**/*.rs: Add doc comments on public structs/fields in Rust when changing user-facing behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/higgs-engine/src/tool_parser.rs` around lines 340 - 344, Add field-level rustdoc comments for the public struct StreamingToolOutput: document the visible field to explain it contains the partial/accumulated human-visible output text (what callers should render or append) and document new_tool_calls to state it holds ParsedToolCall items produced since the last update (e.g., newly detected tool invocations for the caller to process). Update the struct definition around StreamingToolOutput to include these doc comments for both visible and new_tool_calls so the generated API docs explicitly describe their contracts.
♻️ Duplicate comments (1)
crates/higgs/src/routes/chat.rs (1)
269-279:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTreat empty tool arrays as absent in the non-streaming path too.
The streaming path now normalizes
[]toNone, but the non-streaming path still treatsSome(&[])as tool-enabled. That means templates can still seetoolsas defined here, andhas_toolscan still driveparse_tool_calls/"tool_calls"responses even though the caller provided no usable tools.💡 Suggested fix
- let tools = req.tools.as_deref(); + let tools = req.tools.as_deref().filter(|tools| !tools.is_empty()); @@ - let has_tools = req.tools.is_some(); + let has_tools = tools.is_some();Also applies to: 318-320
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/higgs/src/routes/chat.rs` around lines 269 - 279, The non-streaming path still passes Some(&[]) into template generation which treats empty tool arrays as present; update the code around convert_messages / let tools = req.tools.as_deref() and the analogous block near prepare_chat_prompt_with_thinking (and the other occurrence around lines 318-320) to normalize empty slices to None (e.g., set tools = req.tools.as_deref().filter(|s| !s.is_empty()) or equivalent) before calling engine.prepare_chat_prompt_with_thinking so templates and has_tools logic see absent tools when the caller provided an empty array.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/higgs-engine/src/chat_template.rs`:
- Around line 217-230: The public rustdoc for normalize_tool_call_for_template
is out of date: the implementation now normalizes nested function.arguments (via
normalize_arguments_value) and coerces any non-object or unparseable string into
an empty object {} rather than leaving failed parses untouched; update the
rustdoc text for normalize_tool_call_for_template (and any doc comments on
normalize_arguments_value if present) to state that both top-level arguments and
nested function.arguments are normalized and that non-object results are
replaced with {}.
In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 231-240: coerce_param_value currently treats ParamType::Integer
like Number (using parsed_if(Value::is_number)), allowing fractional inputs like
42.5; update the ParamType::Integer branch in coerce_param_value to only accept
true integers (e.g., use parsed_if with a predicate that returns true for
Value::is_i64 or Value::is_u64 or for Value::is_f64 only when the fractional
part is zero) so fractional numbers are rejected, and keep ParamType::Number
using Value::is_number; also add Rustdoc comments (/// ...) to the public fields
visible and new_tool_calls on the StreamingToolOutput struct to document their
purpose.
---
Outside diff comments:
In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 340-344: Add field-level rustdoc comments for the public struct
StreamingToolOutput: document the visible field to explain it contains the
partial/accumulated human-visible output text (what callers should render or
append) and document new_tool_calls to state it holds ParsedToolCall items
produced since the last update (e.g., newly detected tool invocations for the
caller to process). Update the struct definition around StreamingToolOutput to
include these doc comments for both visible and new_tool_calls so the generated
API docs explicitly describe their contracts.
---
Duplicate comments:
In `@crates/higgs/src/routes/chat.rs`:
- Around line 269-279: The non-streaming path still passes Some(&[]) into
template generation which treats empty tool arrays as present; update the code
around convert_messages / let tools = req.tools.as_deref() and the analogous
block near prepare_chat_prompt_with_thinking (and the other occurrence around
lines 318-320) to normalize empty slices to None (e.g., set tools =
req.tools.as_deref().filter(|s| !s.is_empty()) or equivalent) before calling
engine.prepare_chat_prompt_with_thinking so templates and has_tools logic see
absent tools when the caller provided an empty array.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a8dba805-542c-4233-9792-d4f16d930066
📒 Files selected for processing (3)
crates/higgs-engine/src/chat_template.rscrates/higgs-engine/src/tool_parser.rscrates/higgs/src/routes/chat.rs
|
📋 Motivation & root-cause writeup (why Qwen tool calls were silently dropped): #177 |
- coerce_param_value: split `integer` from `number` — `is_number` accepts
floats, so an `integer` param wrongly coerced `3.14`. `integer` now only
accepts i64/u64 values, falling back to the raw string otherwise.
- chat (non-streaming): treat an empty `tools: []` as absent, mirroring the
streaming path — it no longer defines `tools` in the template context or
triggers tool parsing.
- docs: refresh `normalize_tool_call_for_template` (non-object arguments are
coerced to `{}`, not left untouched; dropped a stray leading line) and
document the public `StreamingToolOutput` fields.
Adds a test pinning the integer/fractional coercion contract.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Qwen3.6 reasons first: in thinking mode the chat template opens `<think>`, so generation starts inside the think block and the tool call is emitted AFTER `</think>`. The chat route prepends `<think>`, splits reasoning via `parse_reasoning`, then runs `parse_tool_calls` on the remainder. A parser that scanned the whole output (or only the reasoning) would silently drop the call — the most common thinking+tools failure mode, and the one dimension the existing suite did not cover. Adds `xml_tool_call_after_think_block_is_extracted`, replicating that composition with the Qwen3.6 XML tool-call format. The 35+ existing parser/streaming tests already cover XML parse + schema coercion, streaming chunk-splits, and unbounded-buffer recovery. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Completes the streaming tool-call story that was scoped in closed PR #63
but never landed. Previously the streaming
/v1/chat/completionsroutestripped tools from the prompt and warned "tool_calls deltas are
unsupported"; the model — deprived of tool context — would hallucinate
fake tool calls as plain text, and voice-agent clients would speak them
out loud. After this PR the streaming route extracts
<tool_call>… </tool_call>blocks from the model output on the fly and emits properToolCallDeltaSSE events.Three pieces (~580 LOC)
tool_parser::StreamingToolCallTrackerNew state machine next to the existing
parse_tool_calls. Buffersstreamed chunks, extracts complete
<tool_call>…</tool_call>blockson the fly. When
active=false(no tools in request) collapses to asingle-allocation passthrough with zero parsing cost.
8 new unit tests verify the invariants:
chat::chat_completions_streamwiringreq.tools.as_deref()toprepare_chat_prompt_with_thinkingsothe chat template renders the tool spec the model recognises (was
always
Nonebefore).StreamingToolCallTracker::processand emit aToolCallDeltaSSEevent per completed call.
finish_reasonuntil after the tracker has drained, so"tool_calls"is reported when the response carried any tool calls.chat_template::normalize_tool_call_for_templateNew helper that walks a tool-call JSON value and:
function.{name,arguments}to the top level — Qwen'schat_template.jinjareferencestool_call.nameandtool_call.argumentsdirectly, not the OpenAI-nested shape.argumentsto a JSON value — fixes thecannot convert value into pairs (in chat:120)minijinja crash thatkilled multi-turn conversations carrying assistant tool_calls in
their history.
Called from
convert_messagesso both streaming and non-streaming pathsget the fix. 4 new unit tests cover the OpenAI shape, Qwen-flat shape
(no-op), unparseable-string arguments (kept as string), and non-object
inputs (no-op).
Test plan
cargo test -p higgs-models --lib— 242 passedcargo test -p higgs-engine --lib— 338 passed (+27 tracker tests, +4 normalize tests, 25 ignored)cargo test -p higgs --release --lib— 457 passed (no regression on existingconvert_messagestests)cargo clippy -p higgs -p higgs-engine -p higgs-models --tests --all-features -- -Dwarnings— cleanStreaming with tool-calls enabled; will emit tool_calls deltas via StreamingToolCallTracker)Closes/replaces
Supersedes closed #63 ("feat: streaming tool call support + Hermes XML
parser") with a surgical port targeting current
main— the originalPR's diff has shrunk from 40 files / 2K+ lines to 3 files / 580 lines
now that the Qwen3.5/3.6 + TurboQuant + paged-cache base stack has
landed.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes / Reliability
Tests