Skip to content

chat : add EOS token to additional_stops for autoparser templates #20805

Closed
jpohhhh wants to merge 1 commit into
ggml-org:masterfrom
jpohhhh:fix-autoparser-eos-stop
Closed

chat : add EOS token to additional_stops for autoparser templates #20805
jpohhhh wants to merge 1 commit into
ggml-org:masterfrom
jpohhhh:fix-autoparser-eos-stop

Conversation

@jpohhhh
Copy link
Copy Markdown
Contributor

@jpohhhh jpohhhh commented Mar 20, 2026

Some models emit the EOS token as text (e.g. ) rather than as the special EOS token ID. The PEG parser fails at end-of-input because the trailing EOS text isn't consumed.

Regression introduced in 566059a (Autoparser #18675, 2026-03-06).

Fix: add the template's EOS token to additional_stops so the server strips it before the output reaches the parser.

Unit test:

cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
cmake --build build --target test-chat
./build/bin/test-chat

Server repro (bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF, temp=0):

llama-server -m Mistral-Small-3.2-24B-Instruct-2506-IQ2_M.gguf --jinja

200 before 566059a, 500 after

curl http://localhost:8080/v1/chat/completions -d '{
"messages": [{"role": "user", "content": "Weather in Tokyo?"}],
"tools": [{"type": "function", "function": {
"name": "get_weather", "description": "Get weather",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}}],
"temperature": 0, "max_tokens": 200
}'

Some models emit the EOS token as text (e.g. </s>) rather than as
the special EOS token ID. The PEG parser fails at end-of-input because
the trailing EOS text isn't consumed.

Regression introduced in 566059a (Autoparser ggml-org#18675, 2026-03-06).

Fix: add the template's EOS token to additional_stops so the server
strips it before the output reaches the parser.

Unit test:

  cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
  cmake --build build --target test-chat
  ./build/bin/test-chat

Server repro (bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF, temp=0):

  llama-server -m Mistral-Small-3.2-24B-Instruct-2506-IQ2_M.gguf --jinja

  # 200 before 566059a, 500 after
  curl http://localhost:8080/v1/chat/completions -d '{
    "messages": [{"role": "user", "content": "Weather in Tokyo?"}],
    "tools": [{"type": "function", "function": {
      "name": "get_weather", "description": "Get weather",
      "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
    }}],
    "temperature": 0, "max_tokens": 200
  }'
@jpohhhh jpohhhh requested review from a team and pwilkin as code owners March 20, 2026 15:17
@github-actions github-actions Bot added the testing Everything test related label Mar 20, 2026
@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Mar 20, 2026

Yet another bad change for a non-issue that masks a real issue.

If a model actually emits an EOS token "as a normal token", that's a tokenizer error that needs to be fixed, not masked in the parser.

@pwilkin pwilkin closed this Mar 20, 2026
@jpohhhh
Copy link
Copy Markdown
Contributor Author

jpohhhh commented Mar 20, 2026

Yet another bad change for a non-issue that masks a real issue.

If a model actually emits an EOS token "as a normal token", that's a tokenizer error that needs to be fixed, not masked in the parser.

You're being abusive.

I am finding issues with the models I support, making sure they repro with server binary and my API, and then making sure they repro with a unit test. These are not non-issues :( Why are you sooooo mean, all the time?

@jpohhhh
Copy link
Copy Markdown
Contributor Author

jpohhhh commented Mar 20, 2026

cc @bartowski1182 , bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF can't be handled by new parser. Any advice, here? I don't know how to square the circle here.

@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Mar 20, 2026

No, you're being abusive by spamming PRs without opening issues with actual real-life reproductions. Please stop it.

@jpohhhh
Copy link
Copy Markdown
Contributor Author

jpohhhh commented Mar 20, 2026

No, you're being abusive by spamming PRs without opening issues with actual real-life reproductions. Please stop it.

I don't know what you mean, I'm holding myself to a really strict standard - it has to repro with the llama-server binary, and I have to give the exact command and bisect, and have unit tests.

Please tell someone who can ban me from PRs your claim that I'm spamming PRs without repros and have them evaluate your claim. I'm happy to get banned if they agree.

@ggerganov
Copy link
Copy Markdown
Member

@jpohhhh I'm sorry, but I decided to block you for 30 days. Would recommend to change your approach if you wish to contribute to the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants