Skip to content

Eval bug: SSE first chunk not fully openai compatible #22722

@sonic182

Description

@sonic182

Name and Version

./build/bin/llama-cli --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 7807 MiB):
Device 0: NVIDIA GeForce RTX 4060 Laptop GPU, compute capability 8.9, VMM: yes, VRAM: 7807 MiB
version: 8988 (6118c04)
built with GNU 15.2.1 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

NVIDIA GeForce RTX 4060 (8gb vram)

Models

gemma-4-E4B-it-Q4_K_M.gguf
mmproj-gemma-4-E4B-it-BF16.gguf

lmstudio comunity version https://huggingface.co/lmstudio-community/gemma-4-E4B-it-GGUF

Problem description & steps to reproduce

When llama-server streams a tool call response, the first SSE delta incorrectly combines name, id, type, and the opening argument fragment ({) in a single chunk. Most OpenAI-compatible clients expect the defining chunk to carry an empty arguments string, with the actual argument bytes arriving as separate subsequent fragments. As a result, clients that accumulate argument fragments lose the leading { and end up with invalid JSON, causing tool calls to execute with empty parameters.

At first, I made a python proxy for the fix then got help with claude code for the fix in llama-server

Analysis

This is claude code... but very very summarized

Affected component

tools/server/server-task.cppserver_task_result_cmpl_partial::update()

Observed behaviour

First SSE chunk received by client:

{
  "choices": [{
    "delta": {
      "tool_calls": [{
        "index": 0,
        "id": "R9g09d5ky0p4gQIJl6pyaJZWEj2jnaYL",
        "type": "function",
        "function": {
          "name": "query_agent",
          "arguments": "{"
        }
      }]
    }
  }]
}

Subsequent fragments (one per token):

{"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"agent_name\""}}]}}]}
{"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":":\"search\""}}]}}]}
...

A client that accumulates fragments by index gets:

arg_fragments[0] = "\"agent_name\":\"search\",..."   ← missing leading "{"

Jason.decode(...) (or equivalent) fails → tool call executed with arguments: {}.

Expected behaviour

First chunk (defining) — name and id only, no argument content:

{"function": {"name": "query_agent", "arguments": ""}}

Second chunk — first argument fragment:

{"function": {"arguments": "{"}}

Subsequent chunks continue accumulating until the complete JSON is assembled:
{"agent_name":"search","question":"..."} → parses correctly.

Root cause

update_chat_msg() has a filter_tool_calls parameter (default false) that, when
true, activates splitting logic (lines 167–218) that:

  1. Emits a name-only header chunk before any arguments
  2. Emits argument bytes as separate argument-only fragments

However, the streaming partial-result path in server_task_result_cmpl_partial::update()
never passed filter_tool_calls = true:

// server-task.cpp:1378 — before fix
state.update_chat_msg(content, true, oaicompat_msg_diffs);
//                                    ↑ filter_tool_calls defaults to false

With filter_tool_calls = false, line 165 simply does diffs = std::move(all_diffs),
bypassing the splitting logic entirely and emitting raw diffs from compute_diffs() which
combine name + first argument fragment in one diff.

The non-streaming final-result path (server-task.h:384) passes is_partial = false
and is unaffected — the complete arguments JSON is always valid at that point.

Fix

One-line change in tools/server/server-task.cpp:1378:

// Before
state.update_chat_msg(content, true, oaicompat_msg_diffs);

// After
state.update_chat_msg(content, true, oaicompat_msg_diffs, /* filter_tool_calls= */ true);

This works for my case but not sure if is globally ok

Affected clients

Any OpenAI-compatible streaming client that:

  • Accumulates function.arguments fragments by index across multiple SSE chunks
  • Attempts to JSON-decode only after finish_reason: tool_calls

Notably: req_llm (Elixir), LangChain, LlamaIndex, and similar agent frameworks.

Workaround (pre-fix)

A proxy can split the combined chunk before forwarding:

def _split_defining_tool_call_chunks(chunk):
    extra_frags = []
    for choice in chunk.get("choices", []):
        for tc in choice.get("delta", {}).get("tool_calls", []):
            fn = tc.get("function", {})
            if fn.get("name") and fn.get("arguments"):
                args = fn["arguments"]
                fn["arguments"] = ""
                extra_frags.append({
                    "choices": [{"delta": {"tool_calls": [
                        {"index": tc["index"], "function": {"arguments": args}}
                    ]}, "index": choice.get("index", 0), "finish_reason": None}]
                })
    return [chunk] + extra_frags if extra_frags else [chunk]

First Bad Commit

566059a Autoparser - complete refactoring of parser architecture (#18675)
The filter_tool_calls parameter and splitting logic were added in this commit, but the streaming path (server_task_result_cmpl_partial::update,
server-task.cpp:1378) was never updated to pass true. The fix code exists and is correct — it was simply never activated.

Relevant log output

Logs
# First SSE delta — name + arguments opening brace COMBINED in one chunk:
sse tool_call delta: [{"index": 0, "id": "R9g09d5ky0p4gQIJl6pyaJZWEj2jnaYL", "type": "function", "function": {"name": "query_agent", "arguments": "{"}}]

# Subsequent deltas — argument fragments only (one per token):
sse tool_call delta: [{"index": 0, "function": {"arguments": "\"agent"}}]
sse tool_call delta: [{"index": 0, "function": {"arguments": "_"}}]
sse tool_call delta: [{"index": 0, "function": {"arguments": "name"}}]
sse tool_call delta: [{"index": 0, "function": {"arguments": "\":"}}]
sse tool_call delta: [{"index": 0, "function": {"arguments": "\"search\""}}]
sse finish_reason=tool_calls

# Client accumulates fragments by index → assembled string:
# arg_fragments[0] = "\"agent_name\":\"search\",..."   ← leading "{" is lost
# JSON decode fails → tool executed with empty arguments

# Resulting tool error sent back by client:
"Invalid parameters: required :agent_name option not found, received options: []"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions