fix: streaming tool-call argument loss in NativeToolCallParser (#695) by awschmeder · Pull Request #700 · Zoo-Code-Org/Zoo-Code

awschmeder · 2026-06-23T16:56:24Z

Related GitHub Issue

Closes: #695

Description

When a provider streams a tool call whose first delta(s) arrive before the tool-call id is known, those leading argument bytes are silently discarded by NativeToolCallParser.processRawChunk. This causes downstream "missing required parameter" and other spurious provider-dependent errors even when the model supplied the correct tool use syntax.

This PR fixes the issue by centralizing the tracking of streaming tool calls in NativeToolCallParser. The rawChunkTracker is now initialized on the first sight of a stream index, independent of whether an id is present. All arguments deltas are buffered until both id and name are known, ensuring no data loss during streaming reassembly.

Scope note: accompanying `Task.ts` change

The fix spans two layers because the parser and its streaming consumer must agree on when a tool call is finalized:

NativeToolCallParser (the core fix): buffers pre-id argument deltas and only emits tool_call_end for started trackers with a non-empty id, preventing both data loss and phantom end events.
Task.ts (required consumer change): providers emit a tool_call_end stream chunk on finish_reason: "tool_calls" (either directly, or via processFinishReason() for openrouter/lm-studio/qwen-code). Task.ts's stream switch had no tool_call_end case, so those chunks were silently dropped and tool calls only finalized at stream end via finalizeRawChunks(). Without this change, the parser's correctly-buffered arguments would still not be presented during streaming. Adding the new case would have created a third copy of the finalize/present logic (the codebase already had two: the per-chunk event loop and the finalizeRawChunks() loop), so all three sites are consolidated into a single idempotent helper, finalizeStreamingToolCallById(id). Re-finalizing an already-cleared id is a safe no-op, so the new streaming finalization and the end-of-stream pass cannot double-present.

Test Procedure

Ran the newly added unit test in src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts which verifies that leading argument bytes arriving before the id are correctly preserved and finalized.
Verified that existing provider tests in the same test file pass.
Added src/core/task/__tests__/finalizeStreamingToolCallById.spec.ts, which exercises the real Task.finalizeStreamingToolCallById helper across the success, malformed-JSON, untracked-id, and idempotent re-finalize paths (closes the patch-coverage gap on the consumer change).

Pre-Submission Checklist

Issue Linked: This PR is linked to an approved GitHub Issue.
Scope: My changes are focused on the linked issue (one major feature/fix per PR).
Self-Review: I have performed a thorough self-review of my code.
Testing: New and/or updated tests have been added to cover my changes.
Documentation Impact: I have considered if my changes require documentation updates.
Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

N/A

Documentation Updates

No documentation updates are required.

Additional Notes

N/A

Get in Touch

@awschmeder

Summary by CodeRabbit

Bug Fixes
- Improved streamed tool-call reassembly so tool calls can start when id/name arrive after argument chunks.
- Buffered and replayed early argument data once the tool call is identified.
- Prevented phantom or duplicate tool-call end events for incomplete/malformed or double-finished streams.
- Ensured parallel tool calls remain isolated by stream index.
Tests
- Added streaming reassembly and event-ordering coverage, including late/split/out-of-order chunks and regression checks for end-event behavior.

…ode-Org#695)

coderabbitai · 2026-06-23T16:56:45Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

NativeToolCallParser now starts tracking on first index sighting, preserves early argument chunks until id and name arrive, and avoids emitting end events for incomplete streams. Task now uses one helper for native tool-call completion, and the spec adds reassembly regressions.

Changes

NativeToolCallParser streaming reassembly fix

Layer / File(s)	Summary
Tracker initialization and buffered start `src/core/assistant-message/NativeToolCallParser.ts`	`processRawChunk` now creates a tracker on first index observation, records `id` and `name` separately, and flushes buffered argument deltas after `tool_call_start`.
End-event guards `src/core/assistant-message/NativeToolCallParser.ts`, `src/core/task/Task.ts`	`processFinishReason` and `finalizeRawChunks` now emit `tool_call_end` only for started trackers with a non-empty `id`, and `Task` routes both native streaming completion paths through `finalizeStreamingToolCallById`.
Streaming reassembly tests `src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts`, `src/core/task/__tests__/finalizeStreamingToolCallById.spec.ts`	The specs add reassembly regressions plus helper coverage for successful finalization, malformed finalization, no-op ids, and idempotent reuse.

Sequence Diagram(s)

sequenceDiagram
  participant NativeToolCallParser
  participant Task
  participant assistantMessageContent

  NativeToolCallParser->>Task: tool_call_end event
  Task->>NativeToolCallParser: finalizeStreamingToolCall(id)
  NativeToolCallParser-->>Task: finalized tool-use data or null
  Task->>assistantMessageContent: update tool-use block and clear streaming state
  Task->>Task: presentAssistantMessage(this)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

taltas
hannesrudolph
navedmerchant
JamesRobert20

Poem

🐰 I hopped through chunks from left to right,
and tucked the bytes away just right.
With id and name at last in sight,
the starts and ends now land upright,
flop ears, big grin—streaming feels bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	Task.ts finalization logic and its tests go beyond the parser-only fix required by `#695` and change downstream stream handling.	Split the Task.ts finalization/idempotency changes into a separate PR or remove them so this PR stays focused on NativeToolCallParser.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main fix in NativeToolCallParser and references the linked issue.
Linked Issues check	✅ Passed	The parser now initializes on index and buffers early arguments until id/name arrive, matching issue `#695`'s required fix.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description matches the required template and includes the issue link, summary, test steps, checklist, and notes.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-06-23T17:00:10Z

Codecov Report

❌ Patch coverage is 78.78788% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/core/task/Task.ts	71.42%	4 Missing and 2 partials ⚠️
src/core/assistant-message/NativeToolCallParser.ts	91.66%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.changeset/fix-toolcall-dropped-leading-deltas.md (1)
1-6: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Remove this agent-generated changeset file from the PR.

This file conflicts with the repository’s changeset policy and should not be committed in this change.

As per coding guidelines: ".changeset/**: Do NOT create .changeset files for each commit or code change. Changesets are managed separately by maintainers and should not be generated by agents during normal development."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.changeset/fix-toolcall-dropped-leading-deltas.md around lines 1 - 6, The
.changeset/fix-toolcall-dropped-leading-deltas.md file was auto-generated and
conflicts with the repository's changeset policy which specifies that changeset
files should not be created by agents during normal development. Remove this
file from the PR entirely as changesets are managed separately by maintainers
only.
Source: Coding guidelines

🧹 Nitpick comments (1)

src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts (1)
373-380: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add one explicit test that exercises finalizeRawChunks() directly.

The new suite validates the processFinishReason end path, but this helper clears raw state instead of asserting the finalizeRawChunks guard that was also changed in this PR. A focused case for that path would harden regression coverage.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts` around
lines 373 - 380, Add a new focused test case within the test suite that directly
invokes the NativeToolCallParser.finalizeRawChunks() method to validate its
behavior. This test should verify that the guard logic in finalizeRawChunks
works correctly, since the current test validates the processFinishReason path
which uses clearRawChunkState instead. Include assertions that confirm the
expected end events are produced by finalizeRawChunks() and that raw state is
properly finalized, ensuring regression coverage for the changes made to this
method in this PR.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.changeset/fix-toolcall-dropped-leading-deltas.md:
- Around line 1-6: The .changeset/fix-toolcall-dropped-leading-deltas.md file
was auto-generated and conflicts with the repository's changeset policy which
specifies that changeset files should not be created by agents during normal
development. Remove this file from the PR entirely as changesets are managed
separately by maintainers only.

---

Nitpick comments:
In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts`:
- Around line 373-380: Add a new focused test case within the test suite that
directly invokes the NativeToolCallParser.finalizeRawChunks() method to validate
its behavior. This test should verify that the guard logic in finalizeRawChunks
works correctly, since the current test validates the processFinishReason path
which uses clearRawChunkState instead. Include assertions that confirm the
expected end events are produced by finalizeRawChunks() and that raw state is
properly finalized, ensuring regression coverage for the changes made to this
method in this PR.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 583ebfd1-e91f-4526-89e4-69d1726d9c0a

📥 Commits

Reviewing files that changed from the base of the PR and between e8acc6a and 95439a0.

📒 Files selected for processing (5)

.changeset/fix-toolcall-dropped-leading-deltas.md
prs/fix-toolcall-dropped-leading-deltas.md
src/core/assistant-message/NativeToolCallParser.ts
src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts
src/core/prompts/tools/native-tools/ask_followup_question.ts

…rser

awschmeder · 2026-06-23T21:25:18Z

@coderabbitai review

coderabbitai · 2026-06-23T21:25:25Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

edelauna

Thanks! Had a couple comments on testing some edge cases.

…ing reassembly - Guard finalize results with not.toBeNull() in parallel-index and single-chunk tests so a null result fails instead of passing silently - Add reverse-ordering test (name -> buffered args -> id) covering the start-gate id requirement - Use name !== undefined recording plus a nameSeen flag in the start-gate as a defensive guard against an empty tool name - Clear rawChunkTracker in processFinishReason so finalizeRawChunks is a safe no-op; add a regression test asserting no double tool_call_end - Remove unrelated ask_followup_question wording change from PR scope - Remove prs/fix-toolcall-dropped-leading-deltas.md from the diff

coderabbitai

🧹 Nitpick comments (1)

src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts (1)
589-595: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Optionally assert finalizeEvents is empty for sharper failure localization.

The combined-length assertion proves no double-fire, but won't show which call emitted the duplicate on regression. Asserting the source split makes the intended contract (finish emits, finalize is a no-op) explicit.
♻️ Optional: assert per-call ends
 		const allEnds = [...finishEvents, ...finalizeEvents].filter((e) => e.type === "tool_call_end")
 		expect(allEnds).toHaveLength(1)
 		expect(allEnds[0].id).toBe("call_dup")
+		// finishReason emits the single end; finalize must be a no-op for the same tracker.
+		expect(finishEvents.filter((e) => e.type === "tool_call_end")).toHaveLength(1)
+		expect(finalizeEvents.filter((e) => e.type === "tool_call_end")).toHaveLength(0)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts` around
lines 589 - 595, The test around NativeToolCallParser’s duplicate end handling
only checks the combined tool_call_end count, so tighten it by asserting
finishEvents contains the single expected end for call_dup and finalizeEvents
has no tool_call_end entries. Use the NativeToolCallParser.processFinishReason
and NativeToolCallParser.finalizeRawChunks calls to make the contract explicit:
finish emits the end event, and finalize is a no-op for that case.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts`:
- Around line 589-595: The test around NativeToolCallParser’s duplicate end
handling only checks the combined tool_call_end count, so tighten it by
asserting finishEvents contains the single expected end for call_dup and
finalizeEvents has no tool_call_end entries. Use the
NativeToolCallParser.processFinishReason and
NativeToolCallParser.finalizeRawChunks calls to make the contract explicit:
finish emits the end event, and finalize is a no-op for that case.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 79060a7f-f372-4ca0-b035-f7d0e4e13f95

📥 Commits

Reviewing files that changed from the base of the PR and between a01a838 and fba6c71.

📒 Files selected for processing (2)

src/core/assistant-message/NativeToolCallParser.ts
src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/core/assistant-message/NativeToolCallParser.ts

Task.ts had no stream-level case for tool_call_end, so end chunks emitted by providers on finish_reason: "tool_calls" were silently dropped; tool calls only finalized at stream end via finalizeRawChunks(). Add a tool_call_end case so tools finalize and present during streaming, and extract the triplicated finalize/present logic into a shared idempotent helper. Correct the NativeToolCallParser test drive helper to finalize via finalizeRawChunks() (matching production) instead of processFinishReason().

awschmeder · 2026-06-26T20:35:25Z

Pushed a follow-up commit (f2396b9) extending this fix to the streaming consumer in Task.ts.

Problem: Providers emit a tool_call_end stream chunk on finish_reason: "tool_calls" -- either directly (OpenAI-style providers tracking their own activeToolCallIds) or via NativeToolCallParser.processFinishReason() (openrouter, lm-studio, qwen-code). But Task.ts's stream switch had no tool_call_end case, so those chunks were silently dropped and tool calls only finalized at stream end via finalizeRawChunks().

Changes:

Added a tool_call_end case to the Task.ts stream switch so tools finalize and present during streaming.
Extracted the previously triplicated finalize/present logic (per-chunk event loop, the new case, and the finalizeRawChunks loop) into a single idempotent helper finalizeStreamingToolCallById(id). Re-finalizing an already-cleared id is a safe no-op, so the new streaming finalization and the end-of-stream finalizeRawChunks() pass cannot double-present.
Corrected the NativeToolCallParser test drive helper to emit ends via finalizeRawChunks() (matching what Task.ts actually does) instead of processFinishReason(); its comment previously misdescribed the production wiring. The dedicated double-fire test still exercises processFinishReason directly, since that remains a real provider-facing API.

Verification:

NativeToolCallParser.spec.ts: 21/21 pass
openrouter.spec.ts + lmstudio-native-tools.spec.ts + base-openai-compatible-provider.spec.ts: 45/45 pass
duplicate-tool-use-ids.spec.ts + presentAssistantMessage-custom-tool.spec.ts: 18/18 pass
npx tsc --noEmit on src: clean

Add a focused spec that invokes the real Task.prototype.finalizeStreamingToolCallById via .call() with mocked presentAssistantMessage and NativeToolCallParser, covering the success, null-finalize (malformed JSON), untracked-id no-op, and idempotent re-finalize paths. Closes the codecov/patch gap on the new helper.

fix: streaming tool-call argument loss in NativeToolCallParser (Zoo-C…

95439a0

…ode-Org#695)

awschmeder requested review from JamesRobert20, edelauna, hannesrudolph, navedmerchant and taltas as code owners June 23, 2026 16:56

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

awschmeder added 2 commits June 23, 2026 11:35

chore: remove agent-generated changeset file per policy

d911e98

test: add direct coverage for finalizeRawChunks() in NativeToolCallPa…

a01a838

…rser

edelauna requested changes Jun 24, 2026

View reviewed changes

github-actions Bot added the awaiting-author PR is waiting for the author to address requested changes label Jun 24, 2026

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

awschmeder and others added 3 commits June 26, 2026 18:08

merge: sync upstream/main into fix/695-toolcall-dropped-leading-deltas

08a8c8e

Merge branch 'main' into fix/695-toolcall-dropped-leading-deltas

5e062ea

Uh oh!

Conversation

awschmeder commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related GitHub Issue

Description

Scope note: accompanying Task.ts change

Test Procedure

Pre-Submission Checklist

Screenshots / Videos

Documentation Updates

Additional Notes

Get in Touch

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

awschmeder commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edelauna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

awschmeder commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awschmeder commented Jun 23, 2026 •

edited

Loading

Scope note: accompanying `Task.ts` change

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

codecov Bot commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading