Skip to content

fix: streaming tool-call argument loss in NativeToolCallParser (#695)#700

Open
awschmeder wants to merge 8 commits into
Zoo-Code-Org:mainfrom
awschmeder:fix/695-toolcall-dropped-leading-deltas
Open

fix: streaming tool-call argument loss in NativeToolCallParser (#695)#700
awschmeder wants to merge 8 commits into
Zoo-Code-Org:mainfrom
awschmeder:fix/695-toolcall-dropped-leading-deltas

Conversation

@awschmeder

@awschmeder awschmeder commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Related GitHub Issue

Closes: #695

Description

When a provider streams a tool call whose first delta(s) arrive before the tool-call id is known, those leading argument bytes are silently discarded by NativeToolCallParser.processRawChunk. This causes downstream "missing required parameter" and other spurious provider-dependent errors even when the model supplied the correct tool use syntax.

This PR fixes the issue by centralizing the tracking of streaming tool calls in NativeToolCallParser. The rawChunkTracker is now initialized on the first sight of a stream index, independent of whether an id is present. All arguments deltas are buffered until both id and name are known, ensuring no data loss during streaming reassembly.

Scope note: accompanying Task.ts change

The fix spans two layers because the parser and its streaming consumer must agree on when a tool call is finalized:

  • NativeToolCallParser (the core fix): buffers pre-id argument deltas and only emits tool_call_end for started trackers with a non-empty id, preventing both data loss and phantom end events.
  • Task.ts (required consumer change): providers emit a tool_call_end stream chunk on finish_reason: "tool_calls" (either directly, or via processFinishReason() for openrouter/lm-studio/qwen-code). Task.ts's stream switch had no tool_call_end case, so those chunks were silently dropped and tool calls only finalized at stream end via finalizeRawChunks(). Without this change, the parser's correctly-buffered arguments would still not be presented during streaming. Adding the new case would have created a third copy of the finalize/present logic (the codebase already had two: the per-chunk event loop and the finalizeRawChunks() loop), so all three sites are consolidated into a single idempotent helper, finalizeStreamingToolCallById(id). Re-finalizing an already-cleared id is a safe no-op, so the new streaming finalization and the end-of-stream pass cannot double-present.

Test Procedure

  1. Ran the newly added unit test in src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts which verifies that leading argument bytes arriving before the id are correctly preserved and finalized.
  2. Verified that existing provider tests in the same test file pass.
  3. Added src/core/task/__tests__/finalizeStreamingToolCallById.spec.ts, which exercises the real Task.finalizeStreamingToolCallById helper across the success, malformed-JSON, untracked-id, and idempotent re-finalize paths (closes the patch-coverage gap on the consumer change).

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue.
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes.
  • Documentation Impact: I have considered if my changes require documentation updates.
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

N/A

Documentation Updates

  • No documentation updates are required.

Additional Notes

N/A

Get in Touch

@awschmeder

Summary by CodeRabbit

  • Bug Fixes
    • Improved streamed tool-call reassembly so tool calls can start when id/name arrive after argument chunks.
    • Buffered and replayed early argument data once the tool call is identified.
    • Prevented phantom or duplicate tool-call end events for incomplete/malformed or double-finished streams.
    • Ensured parallel tool calls remain isolated by stream index.
  • Tests
    • Added streaming reassembly and event-ordering coverage, including late/split/out-of-order chunks and regression checks for end-event behavior.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

NativeToolCallParser now starts tracking on first index sighting, preserves early argument chunks until id and name arrive, and avoids emitting end events for incomplete streams. Task now uses one helper for native tool-call completion, and the spec adds reassembly regressions.

Changes

NativeToolCallParser streaming reassembly fix

Layer / File(s) Summary
Tracker initialization and buffered start
src/core/assistant-message/NativeToolCallParser.ts
processRawChunk now creates a tracker on first index observation, records id and name separately, and flushes buffered argument deltas after tool_call_start.
End-event guards
src/core/assistant-message/NativeToolCallParser.ts, src/core/task/Task.ts
processFinishReason and finalizeRawChunks now emit tool_call_end only for started trackers with a non-empty id, and Task routes both native streaming completion paths through finalizeStreamingToolCallById.
Streaming reassembly tests
src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts, src/core/task/__tests__/finalizeStreamingToolCallById.spec.ts
The specs add reassembly regressions plus helper coverage for successful finalization, malformed finalization, no-op ids, and idempotent reuse.

Sequence Diagram(s)

sequenceDiagram
  participant NativeToolCallParser
  participant Task
  participant assistantMessageContent

  NativeToolCallParser->>Task: tool_call_end event
  Task->>NativeToolCallParser: finalizeStreamingToolCall(id)
  NativeToolCallParser-->>Task: finalized tool-use data or null
  Task->>assistantMessageContent: update tool-use block and clear streaming state
  Task->>Task: presentAssistantMessage(this)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • taltas
  • hannesrudolph
  • navedmerchant
  • JamesRobert20

Poem

🐰 I hopped through chunks from left to right,
and tucked the bytes away just right.
With id and name at last in sight,
the starts and ends now land upright,
flop ears, big grin—streaming feels bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Task.ts finalization logic and its tests go beyond the parser-only fix required by #695 and change downstream stream handling. Split the Task.ts finalization/idempotency changes into a separate PR or remove them so this PR stays focused on NativeToolCallParser.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main fix in NativeToolCallParser and references the linked issue.
Linked Issues check ✅ Passed The parser now initializes on index and buffers early arguments until id/name arrive, matching issue #695's required fix.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description matches the required template and includes the issue link, summary, test steps, checklist, and notes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.78788% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/core/task/Task.ts 71.42% 4 Missing and 2 partials ⚠️
src/core/assistant-message/NativeToolCallParser.ts 91.66% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.changeset/fix-toolcall-dropped-leading-deltas.md (1)

1-6: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Remove this agent-generated changeset file from the PR.

This file conflicts with the repository’s changeset policy and should not be committed in this change.

As per coding guidelines: ".changeset/**: Do NOT create .changeset files for each commit or code change. Changesets are managed separately by maintainers and should not be generated by agents during normal development."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.changeset/fix-toolcall-dropped-leading-deltas.md around lines 1 - 6, The
.changeset/fix-toolcall-dropped-leading-deltas.md file was auto-generated and
conflicts with the repository's changeset policy which specifies that changeset
files should not be created by agents during normal development. Remove this
file from the PR entirely as changesets are managed separately by maintainers
only.

Source: Coding guidelines

🧹 Nitpick comments (1)
src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts (1)

373-380: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add one explicit test that exercises finalizeRawChunks() directly.

The new suite validates the processFinishReason end path, but this helper clears raw state instead of asserting the finalizeRawChunks guard that was also changed in this PR. A focused case for that path would harden regression coverage.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts` around
lines 373 - 380, Add a new focused test case within the test suite that directly
invokes the NativeToolCallParser.finalizeRawChunks() method to validate its
behavior. This test should verify that the guard logic in finalizeRawChunks
works correctly, since the current test validates the processFinishReason path
which uses clearRawChunkState instead. Include assertions that confirm the
expected end events are produced by finalizeRawChunks() and that raw state is
properly finalized, ensuring regression coverage for the changes made to this
method in this PR.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.changeset/fix-toolcall-dropped-leading-deltas.md:
- Around line 1-6: The .changeset/fix-toolcall-dropped-leading-deltas.md file
was auto-generated and conflicts with the repository's changeset policy which
specifies that changeset files should not be created by agents during normal
development. Remove this file from the PR entirely as changesets are managed
separately by maintainers only.

---

Nitpick comments:
In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts`:
- Around line 373-380: Add a new focused test case within the test suite that
directly invokes the NativeToolCallParser.finalizeRawChunks() method to validate
its behavior. This test should verify that the guard logic in finalizeRawChunks
works correctly, since the current test validates the processFinishReason path
which uses clearRawChunkState instead. Include assertions that confirm the
expected end events are produced by finalizeRawChunks() and that raw state is
properly finalized, ensuring regression coverage for the changes made to this
method in this PR.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 583ebfd1-e91f-4526-89e4-69d1726d9c0a

📥 Commits

Reviewing files that changed from the base of the PR and between e8acc6a and 95439a0.

📒 Files selected for processing (5)
  • .changeset/fix-toolcall-dropped-leading-deltas.md
  • prs/fix-toolcall-dropped-leading-deltas.md
  • src/core/assistant-message/NativeToolCallParser.ts
  • src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts
  • src/core/prompts/tools/native-tools/ask_followup_question.ts

@awschmeder

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@edelauna edelauna left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Had a couple comments on testing some edge cases.

Comment thread src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts
Comment thread src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts
Comment thread src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts
Comment thread src/core/assistant-message/NativeToolCallParser.ts
Comment thread src/core/assistant-message/NativeToolCallParser.ts
Comment thread src/core/prompts/tools/native-tools/ask_followup_question.ts Outdated
Comment thread prs/fix-toolcall-dropped-leading-deltas.md Outdated
@github-actions github-actions Bot added the awaiting-author PR is waiting for the author to address requested changes label Jun 24, 2026
…ing reassembly

- Guard finalize results with not.toBeNull() in parallel-index and single-chunk tests so a null result fails instead of passing silently
- Add reverse-ordering test (name -> buffered args -> id) covering the start-gate id requirement
- Use name !== undefined recording plus a nameSeen flag in the start-gate as a defensive guard against an empty tool name
- Clear rawChunkTracker in processFinishReason so finalizeRawChunks is a safe no-op; add a regression test asserting no double tool_call_end
- Remove unrelated ask_followup_question wording change from PR scope
- Remove prs/fix-toolcall-dropped-leading-deltas.md from the diff

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts (1)

589-595: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Optionally assert finalizeEvents is empty for sharper failure localization.

The combined-length assertion proves no double-fire, but won't show which call emitted the duplicate on regression. Asserting the source split makes the intended contract (finish emits, finalize is a no-op) explicit.

♻️ Optional: assert per-call ends
 		const allEnds = [...finishEvents, ...finalizeEvents].filter((e) => e.type === "tool_call_end")
 		expect(allEnds).toHaveLength(1)
 		expect(allEnds[0].id).toBe("call_dup")
+		// finishReason emits the single end; finalize must be a no-op for the same tracker.
+		expect(finishEvents.filter((e) => e.type === "tool_call_end")).toHaveLength(1)
+		expect(finalizeEvents.filter((e) => e.type === "tool_call_end")).toHaveLength(0)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts` around
lines 589 - 595, The test around NativeToolCallParser’s duplicate end handling
only checks the combined tool_call_end count, so tighten it by asserting
finishEvents contains the single expected end for call_dup and finalizeEvents
has no tool_call_end entries. Use the NativeToolCallParser.processFinishReason
and NativeToolCallParser.finalizeRawChunks calls to make the contract explicit:
finish emits the end event, and finalize is a no-op for that case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts`:
- Around line 589-595: The test around NativeToolCallParser’s duplicate end
handling only checks the combined tool_call_end count, so tighten it by
asserting finishEvents contains the single expected end for call_dup and
finalizeEvents has no tool_call_end entries. Use the
NativeToolCallParser.processFinishReason and
NativeToolCallParser.finalizeRawChunks calls to make the contract explicit:
finish emits the end event, and finalize is a no-op for that case.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 79060a7f-f372-4ca0-b035-f7d0e4e13f95

📥 Commits

Reviewing files that changed from the base of the PR and between a01a838 and fba6c71.

📒 Files selected for processing (2)
  • src/core/assistant-message/NativeToolCallParser.ts
  • src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/core/assistant-message/NativeToolCallParser.ts

Task.ts had no stream-level case for tool_call_end, so end chunks emitted
by providers on finish_reason: "tool_calls" were silently dropped; tool
calls only finalized at stream end via finalizeRawChunks(). Add a
tool_call_end case so tools finalize and present during streaming, and
extract the triplicated finalize/present logic into a shared idempotent
helper. Correct the NativeToolCallParser test drive helper to finalize via
finalizeRawChunks() (matching production) instead of processFinishReason().
@awschmeder

Copy link
Copy Markdown
Contributor Author

Pushed a follow-up commit (f2396b9) extending this fix to the streaming consumer in Task.ts.

Problem: Providers emit a tool_call_end stream chunk on finish_reason: "tool_calls" -- either directly (OpenAI-style providers tracking their own activeToolCallIds) or via NativeToolCallParser.processFinishReason() (openrouter, lm-studio, qwen-code). But Task.ts's stream switch had no tool_call_end case, so those chunks were silently dropped and tool calls only finalized at stream end via finalizeRawChunks().

Changes:

  • Added a tool_call_end case to the Task.ts stream switch so tools finalize and present during streaming.
  • Extracted the previously triplicated finalize/present logic (per-chunk event loop, the new case, and the finalizeRawChunks loop) into a single idempotent helper finalizeStreamingToolCallById(id). Re-finalizing an already-cleared id is a safe no-op, so the new streaming finalization and the end-of-stream finalizeRawChunks() pass cannot double-present.
  • Corrected the NativeToolCallParser test drive helper to emit ends via finalizeRawChunks() (matching what Task.ts actually does) instead of processFinishReason(); its comment previously misdescribed the production wiring. The dedicated double-fire test still exercises processFinishReason directly, since that remains a real provider-facing API.

Verification:

  • NativeToolCallParser.spec.ts: 21/21 pass
  • openrouter.spec.ts + lmstudio-native-tools.spec.ts + base-openai-compatible-provider.spec.ts: 45/45 pass
  • duplicate-tool-use-ids.spec.ts + presentAssistantMessage-custom-tool.spec.ts: 18/18 pass
  • npx tsc --noEmit on src: clean

awschmeder and others added 3 commits June 26, 2026 18:08
Add a focused spec that invokes the real Task.prototype.finalizeStreamingToolCallById
via .call() with mocked presentAssistantMessage and NativeToolCallParser, covering
the success, null-finalize (malformed JSON), untracked-id no-op, and idempotent
re-finalize paths. Closes the codecov/patch gap on the new helper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-author PR is waiting for the author to address requested changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Streaming tool-call argument deltas are silently dropped when arguments arrive before the tool-call id

2 participants