Eval bug: SSE first chunk not fully openai compatible

### Name and Version

./build/bin/llama-cli --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 7807 MiB):
  Device 0: NVIDIA GeForce RTX 4060 Laptop GPU, compute capability 8.9, VMM: yes, VRAM: 7807 MiB
version: 8988 (6118c043b)
built with GNU 15.2.1 for Linux x86_64

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

NVIDIA GeForce RTX 4060 (8gb vram)

### Models

gemma-4-E4B-it-Q4_K_M.gguf 
mmproj-gemma-4-E4B-it-BF16.gguf

lmstudio comunity version https://huggingface.co/lmstudio-community/gemma-4-E4B-it-GGUF

### Problem description & steps to reproduce

When llama-server streams a tool call response, the first SSE delta incorrectly combines `name`, `id`, `type`, and the opening argument fragment (`{`) in a single chunk. Most OpenAI-compatible clients expect the defining chunk to carry an empty `arguments` string, with the actual argument bytes arriving as separate subsequent fragments. As a result, clients that accumulate argument fragments lose the leading `{` and end up with invalid JSON, causing tool calls to execute with empty parameters.

At first, I made a python proxy for the fix then got help with claude code for the fix in llama-server

# Analysis

This is claude code... but very very summarized

## Affected component

`tools/server/server-task.cpp` — `server_task_result_cmpl_partial::update()`

## Observed behaviour

First SSE chunk received by client:

```json
{
  "choices": [{
    "delta": {
      "tool_calls": [{
        "index": 0,
        "id": "R9g09d5ky0p4gQIJl6pyaJZWEj2jnaYL",
        "type": "function",
        "function": {
          "name": "query_agent",
          "arguments": "{"
        }
      }]
    }
  }]
}
```

Subsequent fragments (one per token):

```json
{"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"agent_name\""}}]}}]}
{"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":":\"search\""}}]}}]}
...
```

A client that accumulates fragments by index gets:

```
arg_fragments[0] = "\"agent_name\":\"search\",..."   ← missing leading "{"
```

`Jason.decode(...)` (or equivalent) fails → tool call executed with `arguments: {}`.

## Expected behaviour

First chunk (defining) — name and id only, no argument content:

```json
{"function": {"name": "query_agent", "arguments": ""}}
```

Second chunk — first argument fragment:

```json
{"function": {"arguments": "{"}}
```

Subsequent chunks continue accumulating until the complete JSON is assembled:
`{"agent_name":"search","question":"..."}` → parses correctly.

## Root cause

`update_chat_msg()` has a `filter_tool_calls` parameter (default `false`) that, when
`true`, activates splitting logic (lines 167–218) that:

1. Emits a name-only header chunk before any arguments
2. Emits argument bytes as separate argument-only fragments

However, the streaming partial-result path in `server_task_result_cmpl_partial::update()`
never passed `filter_tool_calls = true`:

```cpp
// server-task.cpp:1378 — before fix
state.update_chat_msg(content, true, oaicompat_msg_diffs);
//                                    ↑ filter_tool_calls defaults to false
```

With `filter_tool_calls = false`, line 165 simply does `diffs = std::move(all_diffs)`,
bypassing the splitting logic entirely and emitting raw diffs from `compute_diffs()` which
combine name + first argument fragment in one diff.

The non-streaming final-result path (`server-task.h:384`) passes `is_partial = false`
and is unaffected — the complete arguments JSON is always valid at that point.

## Fix

One-line change in `tools/server/server-task.cpp:1378`:

```cpp
// Before
state.update_chat_msg(content, true, oaicompat_msg_diffs);

// After
state.update_chat_msg(content, true, oaicompat_msg_diffs, /* filter_tool_calls= */ true);
```

This works for my case but not sure if is globally ok

## Affected clients

Any OpenAI-compatible streaming client that:
- Accumulates `function.arguments` fragments by `index` across multiple SSE chunks
- Attempts to JSON-decode only after `finish_reason: tool_calls`

Notably: `req_llm` (Elixir), LangChain, LlamaIndex, and similar agent frameworks.

## Workaround (pre-fix)

A proxy can split the combined chunk before forwarding:

```python
def _split_defining_tool_call_chunks(chunk):
    extra_frags = []
    for choice in chunk.get("choices", []):
        for tc in choice.get("delta", {}).get("tool_calls", []):
            fn = tc.get("function", {})
            if fn.get("name") and fn.get("arguments"):
                args = fn["arguments"]
                fn["arguments"] = ""
                extra_frags.append({
                    "choices": [{"delta": {"tool_calls": [
                        {"index": tc["index"], "function": {"arguments": args}}
                    ]}, "index": choice.get("index", 0), "finish_reason": None}]
                })
    return [chunk] + extra_frags if extra_frags else [chunk]
```



### First Bad Commit

566059a26 Autoparser - complete refactoring of parser architecture (#18675)
  The filter_tool_calls parameter and splitting logic were added in this commit, but the streaming path (server_task_result_cmpl_partial::update,
  server-task.cpp:1378) was never updated to pass true. The fix code exists and is correct — it was simply never activated.

### Relevant log output

<details>
<summary>Logs</summary>



```console
# First SSE delta — name + arguments opening brace COMBINED in one chunk:
sse tool_call delta: [{"index": 0, "id": "R9g09d5ky0p4gQIJl6pyaJZWEj2jnaYL", "type": "function", "function": {"name": "query_agent", "arguments": "{"}}]

# Subsequent deltas — argument fragments only (one per token):
sse tool_call delta: [{"index": 0, "function": {"arguments": "\"agent"}}]
sse tool_call delta: [{"index": 0, "function": {"arguments": "_"}}]
sse tool_call delta: [{"index": 0, "function": {"arguments": "name"}}]
sse tool_call delta: [{"index": 0, "function": {"arguments": "\":"}}]
sse tool_call delta: [{"index": 0, "function": {"arguments": "\"search\""}}]
sse finish_reason=tool_calls

# Client accumulates fragments by index → assembled string:
# arg_fragments[0] = "\"agent_name\":\"search\",..."   ← leading "{" is lost
# JSON decode fails → tool executed with empty arguments

# Resulting tool error sent back by client:
"Invalid parameters: required :agent_name option not found, received options: []"
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: SSE first chunk not fully openai compatible #22722

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Analysis

Affected component

Observed behaviour

Expected behaviour

Root cause

Fix

Affected clients

Workaround (pre-fix)

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: SSE first chunk not fully openai compatible #22722

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Analysis

Affected component

Observed behaviour

Expected behaviour

Root cause

Fix

Affected clients

Workaround (pre-fix)

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions