Skip to content

Fix ChatVLLM crash on models without a chat template#11

Merged
geoalgo merged 1 commit intoOpenEuroLLM:mainfrom
ferreirafabio:fix/chat-template-fallback
Feb 20, 2026
Merged

Fix ChatVLLM crash on models without a chat template#11
geoalgo merged 1 commit intoOpenEuroLLM:mainfrom
ferreirafabio:fix/chat-template-fallback

Conversation

@ferreirafabio
Copy link
Contributor

What is the problem?

PR #8 introduced the ChatVLLM wrapper which switches from vllm.LLM.generate() to vllm.LLM.chat() so that chat templates get applied correctly. This works great for instruct models that ship a chat template in their tokenizer config. However, base/pretrained models like swiss-ai/Apertus-8B-2509 don't define one. Since transformers >= v4.44 no longer provides a default chat template, calling vllm.LLM.chat() on these models raises a ValueError. This also means we can't evaluate base models for fluency tasks anymore, which is something we need for the project.

How do we solve it?

We detect at model load time whether a chat template is available and pick the right vLLM method accordingly. Three paths:

  1. User passes --chat_template: we use llm.chat() with that explicit template. Useful when you know the right format for a model whose tokenizer doesn't include one.
  2. Tokenizer has a chat template: we use llm.chat() and let vLLM apply it automatically. This is the default path for instruct models.
  3. No chat template found: we fall back to llm.generate() (plain text completion, no chat formatting) and print a warning. This is the expected path for base models used in fluency evaluation.

This way instruct models keep working as before, base models no longer crash, and users can still force a template via the CLI when needed.

Changes

  • openjury/utils.py: add warnings import, three-path template detection in ChatVLLM.__init__(), new _to_raw_text() method for the generate() fallback, pass chat_template through batch() and make_model()
  • openjury/generate.py: forward chat_template parameter in generate_instructions() and generate_base()
  • openjury/generate_and_evaluate.py: add --chat_template CLI argument, thread it through CliArgs, gen_fun partials, and make_model() calls
  • README.md: document chat template behavior under "Model Specification"

Testing

Tested on L40S GPU with vllm 0.10.2 using both Apertus 8B models:

  • swiss-ai/Apertus-8B-2509 (base, no chat template): correctly falls back to llm.generate(), warning emitted, produces valid completions
  • swiss-ai/Apertus-8B-Instruct-2509 (instruct, has chat template): correctly uses llm.chat() with the tokenizer's template
  • swiss-ai/Apertus-8B-2509 + explicit ChatML template: correctly uses llm.chat() with the provided override
  • make_model("VLLM/...") end-to-end: chat_template parameter correctly forwarded through the full pipeline

Models like swiss-ai/Apertus-8B-2509 (base models) don't define a
chat template in their tokenizer config. Since transformers >= v4.44
removed the default template, calling vllm.LLM.chat() on these models
raises ValueError.

Implement three-path resolution:
1. Explicit --chat_template override -> use llm.chat() with that template
2. Tokenizer has a chat template -> use llm.chat() (auto-detected)
3. No template found -> fall back to llm.generate() + warn

This ensures instruct models get chat() automatically, base models get
generate() automatically, and users can still force a template via CLI.
@geoalgo geoalgo merged commit f3cb15e into OpenEuroLLM:main Feb 20, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants