Skip to content

chat : fix Llama 3.x throwing runtime exception if response contains {#20806

Closed
jpohhhh wants to merge 1 commit into
ggml-org:masterfrom
jpohhhh:fix-json-native-content-fallback-v2
Closed

chat : fix Llama 3.x throwing runtime exception if response contains {#20806
jpohhhh wants to merge 1 commit into
ggml-org:masterfrom
jpohhhh:fix-json-native-content-fallback-v2

Conversation

@jpohhhh
Copy link
Copy Markdown
Contributor

@jpohhhh jpohhhh commented Mar 20, 2026

NOTE: I'm aware the server binary has a flag to disable tool call parsing altogether, @pwilkin mentioned it when closing #20800. Opened this PR because: that cannot help API callers, and this is a severe regression- all Llama 3.x requests with tools throw runtime exception if response is freeform and contains {, ex. a hello world C program). This PR's description clarifies that, the other theoratically left open to option to close if the report was server-binary-only and the PR contributor was amenable to disabling all tool calls with Llama 3.x.

For templates like Llama 3.3 where tool_start is "{" (no distinctive marker), the content parser stops at any brace and the tools parser takes over. If the model output contains braces that aren't valid tool calls, the tools parser fails with nothing to absorb the remaining input. Ex. "write me a C program" 500s without starting the server with --skip-chat-parsing. That's fine for the server if a priori you know llama 3.x will be used with the server and can afford to disable tool call altogether. It won't work for API users.

Regression introduced in 566059a (Autoparser #18675, 2026-03-06).

Two failure modes on current master:

  • Content silently truncated at first "{" (partial match)
  • server: HTTP 500 crash (full parse throws), server API: runtime exception

Fix: wrap the existing parser in a choice() with a content-only fallback. The tools path is tried first; when it fails, the fallback returns everything as content. No behavior change for valid tool calls.

Unit test:

cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
cmake --build build --target test-chat
./build/bin/test-chat

Server repro (Llama 3.2 3B, temp=0, tools enabled):

llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf --jinja

200 before 566059a, 500 after

curl http://localhost:8080/v1/chat/completions -d '{
"messages": [{"role": "user", "content": "Write a hello world C program. Just the code, no explanation."}],
"tools": [{"type": "function", "function": {
"name": "get_weather", "description": "Get weather",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}}],
"temperature": 0, "max_tokens": 200
}'

For templates like Llama 3.3 where tool_start is "{" (no distinctive
marker), the content parser stops at any brace and the tools parser
takes over. If the model output contains braces that aren't valid tool
calls, the tools parser fails with nothing to absorb the remaining
input.

Regression introduced in 566059a (Autoparser ggml-org#18675, 2026-03-06).

Two failure modes on current master:
  - Content silently truncated at first "{" (partial match)
  - HTTP 500 crash (full parse throws)

Fix: wrap the existing parser in a choice() with a content-only
fallback. The tools path is tried first; when it fails, the fallback
returns everything as content. No behavior change for valid tool calls.

Unit test:

  cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
  cmake --build build --target test-chat
  ./build/bin/test-chat

Server repro (Llama 3.2 3B, temp=0, tools enabled):

  llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf --jinja

  # 200 before 566059a, 500 after
  curl http://localhost:8080/v1/chat/completions -d '{
    "messages": [{"role": "user", "content": "Write a hello world C program. Just the code, no explanation."}],
    "tools": [{"type": "function", "function": {
      "name": "get_weather", "description": "Get weather",
      "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
    }}],
    "temperature": 0, "max_tokens": 200
  }'
@jpohhhh jpohhhh requested review from a team and pwilkin as code owners March 20, 2026 15:29
@github-actions github-actions Bot added the testing Everything test related label Mar 20, 2026
@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Mar 20, 2026

Please stop this endless spam of PRs to imaginary issues you cannot reproduce with real-life scenarios. Open an issue with an actual model query or message history first.

@pwilkin pwilkin closed this Mar 20, 2026
@jpohhhh
Copy link
Copy Markdown
Contributor Author

jpohhhh commented Mar 20, 2026

Please stop this endless spam of PRs to imaginary issues you cannot reproduce with real-life scenarios. Open an issue with an actual model query or message history first.

See server commands, those are an actual model query.

Moving forward, it is clear you are intending to communicate "file an issue with the server one liners before the PR", which I will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants