Skip to content

feat(chat): streaming tool-call deltas + Qwen-friendly arg normalisation#164

Open
dusterbloom wants to merge 7 commits into
panbanda:mainfrom
dusterbloom:dusterbloom/streaming-tools-port
Open

feat(chat): streaming tool-call deltas + Qwen-friendly arg normalisation#164
dusterbloom wants to merge 7 commits into
panbanda:mainfrom
dusterbloom:dusterbloom/streaming-tools-port

Conversation

@dusterbloom

@dusterbloom dusterbloom commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Completes the streaming tool-call story that was scoped in closed PR #63
but never landed. Previously the streaming /v1/chat/completions route
stripped tools from the prompt and warned "tool_calls deltas are
unsupported"; the model — deprived of tool context — would hallucinate
fake tool calls as plain text, and voice-agent clients would speak them
out loud. After this PR the streaming route extracts <tool_call>… </tool_call> blocks from the model output on the fly and emits proper
ToolCallDelta SSE events.

Three pieces (~580 LOC)

tool_parser::StreamingToolCallTracker

New state machine next to the existing parse_tool_calls. Buffers
streamed chunks, extracts complete <tool_call>…</tool_call> blocks
on the fly. When active=false (no tools in request) collapses to a
single-allocation passthrough with zero parsing cost.

8 new unit tests verify the invariants:

  • inactive=false → pure passthrough
  • complete tag in one chunk → tool call emitted, no visible
  • tag split across N chunks → tracker reassembles
  • text before AND after → both visible, tool extracted
  • invalid JSON inside tag → preserved as visible (no silent loss)
  • unclosed tag at flush → buffered prefix emitted as visible
  • UTF-8 char-boundary safety on tail flush
  • multiple calls with text between → indices tracked correctly

chat::chat_completions_stream wiring

  • Pass req.tools.as_deref() to prepare_chat_prompt_with_thinking so
    the chat template renders the tool spec the model recognises (was
    always None before).
  • Wrap the reasoning-tracker's visible text through
    StreamingToolCallTracker::process and emit a ToolCallDelta SSE
    event per completed call.
  • Defer finish_reason until after the tracker has drained, so
    "tool_calls" is reported when the response carried any tool calls.
  • Final flush drains both trackers — no tokens vanish silently.

chat_template::normalize_tool_call_for_template

New helper that walks a tool-call JSON value and:

  1. Hoists function.{name,arguments} to the top level — Qwen's
    chat_template.jinja references tool_call.name and
    tool_call.arguments directly, not the OpenAI-nested shape.
  2. Parses string-encoded arguments to a JSON value — fixes the
    cannot convert value into pairs (in chat:120) minijinja crash that
    killed multi-turn conversations carrying assistant tool_calls in
    their history.

Called from convert_messages so both streaming and non-streaming paths
get the fix. 4 new unit tests cover the OpenAI shape, Qwen-flat shape
(no-op), unparseable-string arguments (kept as string), and non-object
inputs (no-op).

Test plan

  • cargo test -p higgs-models --lib — 242 passed
  • cargo test -p higgs-engine --lib — 338 passed (+27 tracker tests, +4 normalize tests, 25 ignored)
  • cargo test -p higgs --release --lib — 457 passed (no regression on existing convert_messages tests)
  • cargo clippy -p higgs -p higgs-engine -p higgs-models --tests --all-features -- -Dwarnings — clean
  • Release binary built and the new log strings shipped (Streaming with tool-calls enabled; will emit tool_calls deltas via StreamingToolCallTracker)
  • CI live integration smoke

Closes/replaces

Supersedes closed #63 ("feat: streaming tool call support + Hermes XML
parser") with a surgical port targeting current main — the original
PR's diff has shrunk from 40 files / 2K+ lines to 3 files / 580 lines
now that the Qwen3.5/3.6 + TurboQuant + paged-cache base stack has
landed.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Streaming chat now emits structured tool-call events (SSE) instead of raw tags and defers final reason when tool-calls occurred.
    • Parsing accepts both JSON and XML tool-call forms and applies per-request schema-based coercion.
    • Tool-call data is normalized for template compatibility (hoists nested fields, parses string-encoded arguments).
  • Bug Fixes / Reliability

    • Robust streaming extraction across chunk splits, unclosed-tag flushes, UTF‑8 boundary safety, and capped buffering to avoid unbounded growth.
  • Tests

    • Expanded test coverage for parsing, streaming, normalization, and edge cases.

The streaming `/v1/chat/completions` route previously stripped tools from
the prompt (`prepare_chat_prompt_with_thinking(..., None, ...)`) and warned
that "tool_calls deltas are unsupported". The model, deprived of tool
context, would hallucinate fake tool calls as plain text — and the client's
TTS pipeline would happily speak `<nanobot>read_skill newsreader</nanobot>`
out loud. This commit completes the streaming tool-call story that was
documented but never finished (see closed upstream PR panbanda#63).

Three pieces, ~580 net new lines:

1. `tool_parser::StreamingToolCallTracker`
   New `pub struct` next to the existing `parse_tool_calls`. A small state
   machine that buffers streamed text chunks and extracts complete
   `<tool_call>{json}</tool_call>` blocks on the fly. When `active=false`
   (no tools in the request) it collapses to a single-allocation
   passthrough with zero parsing cost.

   Invariants verified by 8 new unit tests:
   - inactive=false → pure passthrough
   - single complete tag in one chunk
   - tag split across N chunks → reassembled
   - text before AND after tags → both visible
   - invalid JSON inside tag → preserved as visible (no silent loss)
   - unclosed tag at flush → buffered prefix emitted as visible
   - UTF-8 char-boundary safety on tail flush
   - multiple calls with text between → indices tracked correctly

2. `chat::chat_completions_stream` wiring
   - Pass `req.tools.as_deref()` (not None) to
     `prepare_chat_prompt_with_thinking` so the chat template renders the
     tool spec the model recognises.
   - Construct a `StreamingToolCallTracker` keyed off `stream_includes_tools`.
   - On every chunk: route the reasoning-tracker's visible text through
     the tool tracker; emit `ToolCallDelta` SSE events for each completed
     call; emit content delta for the surviving visible text.
   - Defer `finish_reason` until after the tool tracker has drained, so
     we report `"tool_calls"` when the response actually contained any.
   - Final flush drains both trackers so no tokens vanish silently.

3. `chat_template::normalize_tool_call_for_template`
   New `pub fn` that walks a tool-call JSON value and (a) hoists
   `function.{name,arguments}` to the top level — Qwen's
   `chat_template.jinja` references `tool_call.name` and
   `tool_call.arguments` directly, not the OpenAI-nested shape; (b) parses
   string-encoded `arguments` to a JSON value, fixing the
   `cannot convert value into pairs (in chat:120)` minijinja error that
   crashed multi-turn conversations carrying assistant tool_calls in
   their history.

   Verified by 4 new unit tests covering OpenAI shape, Qwen-flat shape
   (no-op), unparseable-string arguments (kept as string), and non-object
   inputs (no-op).

`convert_messages` in the chat route calls the normaliser per tool call so
both the streaming and non-streaming paths get the fix.

Test impact:
- higgs-models: 242 passed (no change)
- higgs-engine: 338 passed (+27 tracker tests, +4 normalize tests, 25 ignored)
- higgs:        457 passed (no regression on existing convert_messages tests)
- cargo clippy -Dwarnings: clean across all three crates
@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5dd580ee-7b4d-44db-8618-1ce261069969

📥 Commits

Reviewing files that changed from the base of the PR and between ab8e8fe and c793a89.

📒 Files selected for processing (3)
  • crates/higgs-engine/src/chat_template.rs
  • crates/higgs-engine/src/tool_parser.rs
  • crates/higgs/src/routes/chat.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/higgs-engine/src/chat_template.rs
  • crates/higgs/src/routes/chat.rs
  • crates/higgs-engine/src/tool_parser.rs

📝 Walkthrough

Walkthrough

Adds Hermes XML and JSON tool-call parsing, a StreamingToolCallTracker to buffer and extract <tool_call> blocks across chunks, template normalization for tool-call JSON, and integration into the chat streaming pipeline to emit structured ToolCallDelta SSE events and defer finish_reason when tool calls occur.

Changes

Streaming Tool-Call Support

Layer / File(s) Summary
Tool-call template normalization
crates/higgs-engine/src/chat_template.rs, crates/higgs/src/routes/chat.rs
normalize_tool_call_for_template flattens OpenAI nested function.{name,arguments} to top-level {name,arguments}, parses string-encoded JSON arguments into structured values, coerces non-mapping arguments to {}, and adds unit tests and lint adjustments.
Tool parser XML/JSON and schema coercion
crates/higgs-engine/src/tool_parser.rs
Add Hermes XML parsing and ToolSchema to coerce XML <parameter> raw strings into typed JSON (int/bool/object/array) when schema provided; update parse_tool_calls(text, schema) signature and module documentation.
StreamingToolCallTracker and streaming parsing
crates/higgs-engine/src/tool_parser.rs
Implement StreamingToolOutput and StreamingToolCallTracker to buffer chunked text, detect/extract <tool_call> boundaries, preserve invalid blocks as visible text, cap unclosed-tag buffering, provide UTF-8-safe flush, and add comprehensive tests for chunking and edge cases.
Chat completions streaming integration
crates/higgs/src/routes/chat.rs
Include tools in prompt rendering, construct tracker with optional ToolSchema, route visible output through the tracker, emit tool_calls SSE deltas and content deltas per chunk, drain/flush tracker at stream end, and defer/override finish_reason to "tool_calls" when calls were detected. Logging updated to debug.

Sequence Diagram

sequenceDiagram
  participant ChatAPI
  participant StreamGenerator
  participant ReasoningTracker
  participant ToolTracker
  participant SSEOutput
  ChatAPI->>StreamGenerator: start streaming with tools
  StreamGenerator->>ReasoningTracker: produce chunk (reasoning, visible)
  ReasoningTracker->>ToolTracker: visible text
  ToolTracker->>ToolTracker: extract <tool_call> blocks across chunks
  ToolTracker->>SSEOutput: emit ToolCallDelta for each completed call
  ToolTracker->>SSEOutput: emit content with remaining visible text
  StreamGenerator->>ToolTracker: flush() on stream end
  ToolTracker->>SSEOutput: emit buffered unclosed content and final finish_reason
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰
I hop through bytes and buffered streams,
I chase the tags in broken dreams,
I stitch the JSON, XML too,
Emit the calls so outputs chew —
A rabbit’s patch to parse for you. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: streaming tool-call deltas and tool-call argument normalization for Qwen compatibility, which are the core objectives of the PR.
Linked Issues check ✅ Passed The PR fulfills all coding objectives from issue #63: implements StreamingToolCallTracker for extracting tool calls from streams, adds normalize_tool_call_for_template for template compatibility, supports Qwen XML parsing, and enables end-to-end streaming tool-call support.
Out of Scope Changes check ✅ Passed All changes are directly related to the linked objectives: three modified files (chat_template.rs, tool_parser.rs, chat.rs) implement streaming tool calls, Qwen XML parsing, normalization, and template support without introducing unrelated functionality.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/higgs-engine/src/tool_parser.rs (1)

101-104: ⚡ Quick win

Document StreamingToolOutput public fields.

StreamingToolOutput.visible and StreamingToolOutput.new_tool_calls are public API fields and should have field-level rustdoc comments for discoverability.

As per coding guidelines, **/*.rs: Add doc comments on public structs/fields in Rust when changing user-facing behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/higgs-engine/src/tool_parser.rs` around lines 101 - 104, Add rustdoc
comments for the public struct StreamingToolOutput and each of its public
fields: document what StreamingToolOutput represents, and add brief field-level
comments for visible (what the string contains and visibility/format
expectations) and new_tool_calls (what ParsedToolCall entries represent and when
they are populated), referencing StreamingToolOutput, visible, and
new_tool_calls so consumers can discover and understand the API.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/higgs/src/routes/chat.rs`:
- Around line 456-459: The code passes Some(&[]) for an empty tools array which
still marks `tools` as defined in templates; change how `prompt_tools` is
derived so empty slices become None: replace `let prompt_tools =
req.tools.as_deref();` with logic that converts empty slices to None (e.g. `let
prompt_tools = req.tools.as_deref().and_then(|t| if t.is_empty() { None } else {
Some(t) });`) so that the call to
`engine.prepare_chat_prompt_with_thinking(&messages, prompt_tools,
thinking_enabled_stream)` receives None for absent/empty tools.

---

Nitpick comments:
In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 101-104: Add rustdoc comments for the public struct
StreamingToolOutput and each of its public fields: document what
StreamingToolOutput represents, and add brief field-level comments for visible
(what the string contains and visibility/format expectations) and new_tool_calls
(what ParsedToolCall entries represent and when they are populated), referencing
StreamingToolOutput, visible, and new_tool_calls so consumers can discover and
understand the API.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7a10d527-c5d2-481a-9a66-5870a5f99e83

📥 Commits

Reviewing files that changed from the base of the PR and between ae6f8b0 and 5aea6d0.

📒 Files selected for processing (3)
  • crates/higgs-engine/src/chat_template.rs
  • crates/higgs-engine/src/tool_parser.rs
  • crates/higgs/src/routes/chat.rs

Comment thread crates/higgs/src/routes/chat.rs Outdated
dusterbloom and others added 3 commits May 21, 2026 17:08
…bind

Qwen's chat_template.jinja:107-108 rebinds the loop variable:

    {%- for tool_call in message.tool_calls %}
        {%- if tool_call.function is defined %}
            {%- set tool_call = tool_call.function %}
        {%- endif %}

After this rebind, line 120's `tool_call.arguments|items` walks into
the ORIGINAL `function.arguments` JSON-encoded string — bypassing the
top-level `arguments` we just normalised. minijinja raises "cannot
convert value into pairs" and the request 500s.

Two-part fix:

1. `normalize_tool_call_for_template` now normalises BOTH the hoisted
   top-level `arguments` AND the still-nested `function.arguments`.
   Pulled the parse-and-coerce logic into a private helper
   `normalize_arguments_value` so it's applied identically at both
   sites (DRY).

2. The coercion now defends against every non-mapping shape — null,
   bool, number, array, string-that-fails-to-parse — by replacing the
   value with an empty `{}`. A warn is logged so pathological shapes
   stay visible. Previously only string-parsing was attempted and any
   other shape leaked through to the template.

New invariant test (`normalize_handles_qwen_rebind_to_function`) pins
both `tool_call.arguments` AND `tool_call.function.arguments` to be
mappings after normalize, so future edits can't regress the rebind
path silently. Five additional coercion tests cover null / array /
number / unparseable-string / array-as-string cases.

Verified by manual smoke: `nanobot agent -l -m "Use your tools to list
files in /tmp"` previously 500'd with the template error; now returns
a real directory listing streamed back through `StreamingToolCallTracker`.

Test count: chat_template 36 (was 31, +5 net for the new shapes), all
green. Clippy `-Dwarnings` clean across higgs / higgs-engine / higgs-models.

Also: cargo fmt sweep on the new code that landed unformatted in the
prior commit; pure layout, no behaviour change.
`req.tools.as_deref()` returns `Some(&[])` when the request carries an
empty `tools` array. The chat-template renderer treats that as `tools`
being *defined* in the Jinja context (just empty), versus `None` which
omits the key entirely. Templates branching on `{% if tools is defined %}`
vs `{% if tools %}` see different state.

Convert empty slices to `None` before handing them to
`prepare_chat_prompt_with_thinking`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses CRITICAL finding from the closed upstream PR panbanda#63 review: a
model that emits `<tool_call>` and never produces a matching
`</tool_call>` would grow `StreamingToolCallTracker::buffer` until OOM
— a model-controlled DoS vector.

On overflow we abandon the parse, surface `<tool_call>` plus the
buffered bytes as visible content (preserving the "never silently drop
tokens" invariant), and reset state so a subsequent well-formed tool
call in the same stream still parses.

New test `streaming_unbounded_buffer_capped_and_recovers` pushes
MAX_INSIDE_TOOL_CALL_BYTES + 1 inside an unclosed tag followed by a
valid call, and asserts both that the overflow is flushed verbatim and
that the post-overflow call still parses (completed_count == 1).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@funkymonkeymonk

Copy link
Copy Markdown

This PR fixes the exact issue where streaming API drops tool definitions and warns 'tool-calls are unsupported'. Running Higgs 1.3.0 with Qwen3-Coder-Next and OpenCode — every streaming request gets tools stripped, forcing a text-based fallback that's unreliable for complex tool calls. The non-streaming path does parse the model's XML output but doesn't convert it to OpenAI's structured tool_calls format either.

All 6 CI checks are green, the implementation is surgical (3 files, +710 -17), and this unblocks local MLX models for use with agentic tools like OpenCode, Claude Code, and Aider. Would love to see this merged for the next release. 🙏

… coercion

Qwen3.5/3.6 models' chat_template.jinja instructs them to emit
`<tool_call><function=NAME><parameter=KEY>value</parameter></function></tool_call>`,
but the existing parser only decodes the older JSON-in-`<tool_call>` shape, so
tool calls leaked into `content` with `finish_reason: stop` and the OpenAI
streaming/non-streaming clients saw nothing.

Add a second parse path that dispatches on the block's first token
(`<function=` → XML, else → JSON), reusing the existing
`StreamingToolCallTracker` boundary detection unchanged. XML values are raw
strings, so coerce them through a `ToolSchema` built from the request's
`tools` array (integer/number/boolean/object/array → typed JSON,
string-typed `"123"` stays a string), falling back to best-effort JSON
parsing when no schema is declared.

- `parse_tool_calls` and `StreamingToolCallTracker::new` now take an
  `Option<&ToolSchema>` / `Option<ToolSchema>`; the streaming path captures
  it from `req.tools` before the `async_stream!` block, the non-streaming
  path threads it at the call site
- Backward-compatible: pre-existing 28 JSON tests still pass; one new test
  exercises a mixed JSON+XML stream

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/higgs-engine/src/tool_parser.rs (1)

340-344: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the public StreamingToolOutput fields.

This struct is new public API, but visible and new_tool_calls still rely on the struct-level summary for their contract. Add field-level rustdoc so the generated API docs stay explicit.

📚 Suggested doc shape
 pub struct StreamingToolOutput {
+    /// Text that should be forwarded to the client as a normal content delta.
     pub visible: String,
+    /// Tool calls that became complete while processing this chunk.
     pub new_tool_calls: Vec<ParsedToolCall>,
 }

As per coding guidelines, **/*.rs: Add doc comments on public structs/fields in Rust when changing user-facing behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/higgs-engine/src/tool_parser.rs` around lines 340 - 344, Add
field-level rustdoc comments for the public struct StreamingToolOutput: document
the visible field to explain it contains the partial/accumulated human-visible
output text (what callers should render or append) and document new_tool_calls
to state it holds ParsedToolCall items produced since the last update (e.g.,
newly detected tool invocations for the caller to process). Update the struct
definition around StreamingToolOutput to include these doc comments for both
visible and new_tool_calls so the generated API docs explicitly describe their
contracts.
♻️ Duplicate comments (1)
crates/higgs/src/routes/chat.rs (1)

269-279: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat empty tool arrays as absent in the non-streaming path too.

The streaming path now normalizes [] to None, but the non-streaming path still treats Some(&[]) as tool-enabled. That means templates can still see tools as defined here, and has_tools can still drive parse_tool_calls / "tool_calls" responses even though the caller provided no usable tools.

💡 Suggested fix
-    let tools = req.tools.as_deref();
+    let tools = req.tools.as_deref().filter(|tools| !tools.is_empty());
@@
-    let has_tools = req.tools.is_some();
+    let has_tools = tools.is_some();

Also applies to: 318-320

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/higgs/src/routes/chat.rs` around lines 269 - 279, The non-streaming
path still passes Some(&[]) into template generation which treats empty tool
arrays as present; update the code around convert_messages / let tools =
req.tools.as_deref() and the analogous block near
prepare_chat_prompt_with_thinking (and the other occurrence around lines
318-320) to normalize empty slices to None (e.g., set tools =
req.tools.as_deref().filter(|s| !s.is_empty()) or equivalent) before calling
engine.prepare_chat_prompt_with_thinking so templates and has_tools logic see
absent tools when the caller provided an empty array.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/higgs-engine/src/chat_template.rs`:
- Around line 217-230: The public rustdoc for normalize_tool_call_for_template
is out of date: the implementation now normalizes nested function.arguments (via
normalize_arguments_value) and coerces any non-object or unparseable string into
an empty object {} rather than leaving failed parses untouched; update the
rustdoc text for normalize_tool_call_for_template (and any doc comments on
normalize_arguments_value if present) to state that both top-level arguments and
nested function.arguments are normalized and that non-object results are
replaced with {}.

In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 231-240: coerce_param_value currently treats ParamType::Integer
like Number (using parsed_if(Value::is_number)), allowing fractional inputs like
42.5; update the ParamType::Integer branch in coerce_param_value to only accept
true integers (e.g., use parsed_if with a predicate that returns true for
Value::is_i64 or Value::is_u64 or for Value::is_f64 only when the fractional
part is zero) so fractional numbers are rejected, and keep ParamType::Number
using Value::is_number; also add Rustdoc comments (/// ...) to the public fields
visible and new_tool_calls on the StreamingToolOutput struct to document their
purpose.

---

Outside diff comments:
In `@crates/higgs-engine/src/tool_parser.rs`:
- Around line 340-344: Add field-level rustdoc comments for the public struct
StreamingToolOutput: document the visible field to explain it contains the
partial/accumulated human-visible output text (what callers should render or
append) and document new_tool_calls to state it holds ParsedToolCall items
produced since the last update (e.g., newly detected tool invocations for the
caller to process). Update the struct definition around StreamingToolOutput to
include these doc comments for both visible and new_tool_calls so the generated
API docs explicitly describe their contracts.

---

Duplicate comments:
In `@crates/higgs/src/routes/chat.rs`:
- Around line 269-279: The non-streaming path still passes Some(&[]) into
template generation which treats empty tool arrays as present; update the code
around convert_messages / let tools = req.tools.as_deref() and the analogous
block near prepare_chat_prompt_with_thinking (and the other occurrence around
lines 318-320) to normalize empty slices to None (e.g., set tools =
req.tools.as_deref().filter(|s| !s.is_empty()) or equivalent) before calling
engine.prepare_chat_prompt_with_thinking so templates and has_tools logic see
absent tools when the caller provided an empty array.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a8dba805-542c-4233-9792-d4f16d930066

📥 Commits

Reviewing files that changed from the base of the PR and between 5aea6d0 and ab8e8fe.

📒 Files selected for processing (3)
  • crates/higgs-engine/src/chat_template.rs
  • crates/higgs-engine/src/tool_parser.rs
  • crates/higgs/src/routes/chat.rs

Comment thread crates/higgs-engine/src/chat_template.rs
Comment thread crates/higgs-engine/src/tool_parser.rs
@dusterbloom

Copy link
Copy Markdown
Contributor Author

📋 Motivation & root-cause writeup (why Qwen tool calls were silently dropped): #177

dusterbloom and others added 2 commits June 2, 2026 15:37
- coerce_param_value: split `integer` from `number` — `is_number` accepts
  floats, so an `integer` param wrongly coerced `3.14`. `integer` now only
  accepts i64/u64 values, falling back to the raw string otherwise.
- chat (non-streaming): treat an empty `tools: []` as absent, mirroring the
  streaming path — it no longer defines `tools` in the template context or
  triggers tool parsing.
- docs: refresh `normalize_tool_call_for_template` (non-object arguments are
  coerced to `{}`, not left untouched; dropped a stray leading line) and
  document the public `StreamingToolOutput` fields.

Adds a test pinning the integer/fractional coercion contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Qwen3.6 reasons first: in thinking mode the chat template opens `<think>`, so
generation starts inside the think block and the tool call is emitted AFTER
`</think>`. The chat route prepends `<think>`, splits reasoning via
`parse_reasoning`, then runs `parse_tool_calls` on the remainder. A parser that
scanned the whole output (or only the reasoning) would silently drop the call —
the most common thinking+tools failure mode, and the one dimension the existing
suite did not cover.

Adds `xml_tool_call_after_think_block_is_extracted`, replicating that
composition with the Qwen3.6 XML tool-call format. The 35+ existing
parser/streaming tests already cover XML parse + schema coercion, streaming
chunk-splits, and unbounded-buffer recovery.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants