Fix ChatVLLM crash on models without a chat template#11
Merged
geoalgo merged 1 commit intoOpenEuroLLM:mainfrom Feb 20, 2026
Merged
Fix ChatVLLM crash on models without a chat template#11geoalgo merged 1 commit intoOpenEuroLLM:mainfrom
geoalgo merged 1 commit intoOpenEuroLLM:mainfrom
Conversation
Models like swiss-ai/Apertus-8B-2509 (base models) don't define a chat template in their tokenizer config. Since transformers >= v4.44 removed the default template, calling vllm.LLM.chat() on these models raises ValueError. Implement three-path resolution: 1. Explicit --chat_template override -> use llm.chat() with that template 2. Tokenizer has a chat template -> use llm.chat() (auto-detected) 3. No template found -> fall back to llm.generate() + warn This ensures instruct models get chat() automatically, base models get generate() automatically, and users can still force a template via CLI.
geoalgo
approved these changes
Feb 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the problem?
PR #8 introduced the
ChatVLLMwrapper which switches fromvllm.LLM.generate()tovllm.LLM.chat()so that chat templates get applied correctly. This works great for instruct models that ship a chat template in their tokenizer config. However, base/pretrained models likeswiss-ai/Apertus-8B-2509don't define one. Sincetransformers >= v4.44no longer provides a default chat template, callingvllm.LLM.chat()on these models raises aValueError. This also means we can't evaluate base models for fluency tasks anymore, which is something we need for the project.How do we solve it?
We detect at model load time whether a chat template is available and pick the right vLLM method accordingly. Three paths:
--chat_template: we usellm.chat()with that explicit template. Useful when you know the right format for a model whose tokenizer doesn't include one.llm.chat()and let vLLM apply it automatically. This is the default path for instruct models.llm.generate()(plain text completion, no chat formatting) and print a warning. This is the expected path for base models used in fluency evaluation.This way instruct models keep working as before, base models no longer crash, and users can still force a template via the CLI when needed.
Changes
openjury/utils.py: addwarningsimport, three-path template detection inChatVLLM.__init__(), new_to_raw_text()method for thegenerate()fallback, passchat_templatethroughbatch()andmake_model()openjury/generate.py: forwardchat_templateparameter ingenerate_instructions()andgenerate_base()openjury/generate_and_evaluate.py: add--chat_templateCLI argument, thread it throughCliArgs,gen_funpartials, andmake_model()callsREADME.md: document chat template behavior under "Model Specification"Testing
Tested on L40S GPU with
vllm 0.10.2using both Apertus 8B models:swiss-ai/Apertus-8B-2509(base, no chat template): correctly falls back tollm.generate(), warning emitted, produces valid completionsswiss-ai/Apertus-8B-Instruct-2509(instruct, has chat template): correctly usesllm.chat()with the tokenizer's templateswiss-ai/Apertus-8B-2509+ explicit ChatML template: correctly usesllm.chat()with the provided overridemake_model("VLLM/...")end-to-end:chat_templateparameter correctly forwarded through the full pipeline