Chat template from chat_template.jinja for all possible paths + custom chat template for Qwen3-VL thinking#4055
Chat template from chat_template.jinja for all possible paths + custom chat template for Qwen3-VL thinking#4055dkalinowski wants to merge 10 commits intomainfrom
Conversation
694d391 to
1a7437e
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the LLM/VLM servable initialization flow to allow overriding the tokenizer chat template from a chat_template.jinja file located in the model path, making that override available across multiple pipeline initializers.
Changes:
- Add logic to detect and read
chat_template.jinjafrom the model path and calltokenizer.set_chat_template(...). - Add
<fstream>include where needed to support reading the template file. - Apply the same override behavior across legacy LM, continuous batching LM, and legacy VLM initializers.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/llm/visual_language_model/legacy/servable_initializer.cpp |
Reads chat_template.jinja (if present) after creating the VLM pipeline/tokenizer and applies it to the tokenizer. |
src/llm/language_model/legacy/servable_initializer.cpp |
Reads chat_template.jinja (if present) after creating the legacy LLM pipeline/tokenizer and applies it to the tokenizer. |
src/llm/language_model/continuous_batching/servable_initializer.cpp |
Reads chat_template.jinja (if present) after creating the CB pipeline/tokenizer and applies it to the tokenizer. |
| properties->tokenizer = properties->pipeline->get_tokenizer(); | ||
|
|
||
| // Override chat template from chat_template.jinja file if present in model directory | ||
| std::filesystem::path chatTemplateJinjaPath = std::filesystem::path(parsedModelsPath) / "chat_template.jinja"; |
There was a problem hiding this comment.
chatTemplateJinjaPath is derived as parsedModelsPath / "chat_template.jinja". Since parseModelsPath() accepts .gguf files as a valid models_path, this won’t locate a template stored next to a GGUF file (it will incorrectly search under <file>.gguf/chat_template.jinja). Consider basing the search directory on parsedModelsPath if it’s a directory, otherwise use parent_path().
| std::filesystem::path chatTemplateJinjaPath = std::filesystem::path(parsedModelsPath) / "chat_template.jinja"; | |
| std::filesystem::path modelsPathFs(parsedModelsPath); | |
| std::filesystem::path chatTemplateDir = modelsPathFs; | |
| if (!chatTemplateDir.empty() && !std::filesystem::is_directory(chatTemplateDir)) { | |
| chatTemplateDir = chatTemplateDir.parent_path(); | |
| } | |
| std::filesystem::path chatTemplateJinjaPath = chatTemplateDir / "chat_template.jinja"; |
There was a problem hiding this comment.
@atobiszei do gguf models have chat_template,jinja file next to model files, or is the chat template built in?
src/llm/language_model/continuous_batching/servable_initializer.cpp
Outdated
Show resolved
Hide resolved
c0d17b0 to
40a3c3b
Compare
51f4881 to
c8b3d54
Compare
src/llm/servable_initializer.cpp
Outdated
| std::istreambuf_iterator<char>()); | ||
| if (!chatTemplateContent.empty()) { | ||
| properties->tokenizer.set_chat_template(chatTemplateContent); | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Overriding chat template from: {}", chatTemplateJinjaPath.string()); |
There was a problem hiding this comment.
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Overriding chat template from: {}", chatTemplateJinjaPath.string()); | |
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Using the chat template from: {}", chatTemplateJinjaPath.string()); |
src/llm/servable_initializer.cpp
Outdated
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Overriding chat template from: {}", chatTemplateJinjaPath.string()); | ||
| } | ||
| } else { | ||
| SPDLOG_LOGGER_WARN(llm_calculator_logger, "Failed to open chat template file: {}", chatTemplateJinjaPath.string()); |
There was a problem hiding this comment.
| SPDLOG_LOGGER_WARN(llm_calculator_logger, "Failed to open chat template file: {}", chatTemplateJinjaPath.string()); | |
| SPDLOG_LOGGER_ERROR(llm_calculator_logger, "Failed to open chat template file: {}", chatTemplateJinjaPath.string()); |
| {%- endfor %} | ||
| {%- if add_generation_prompt %} | ||
| {#- Originally '<|im_start|>assistant\n<think>\n' #} | ||
| {{- '<|im_start|>assistant\n' }} |
There was a problem hiding this comment.
Isn't it possible to turn off thinking?
There was a problem hiding this comment.
No, this chat template doesnt support that originally. Looks like not a common thing. Is it already part of the process, that we always add support for that whenever we introduce new thinking model? @dtrawins
VLM pipelines still prioritize chat template from
openvino_tokenizer.xmlrather thanchat_template.jinjaThis PR changes this, ensures the order is different.
This ensures that Qwen3-VL Thinking is supported