Skip to content

Chat template from chat_template.jinja for all possible paths + custom chat template for Qwen3-VL thinking#4055

Open
dkalinowski wants to merge 10 commits intomainfrom
ovms_chat_template
Open

Chat template from chat_template.jinja for all possible paths + custom chat template for Qwen3-VL thinking#4055
dkalinowski wants to merge 10 commits intomainfrom
ovms_chat_template

Conversation

@dkalinowski
Copy link
Collaborator

@dkalinowski dkalinowski commented Mar 12, 2026

VLM pipelines still prioritize chat template from openvino_tokenizer.xml rather than chat_template.jinja
This PR changes this, ensures the order is different.

This ensures that Qwen3-VL Thinking is supported

@dkalinowski dkalinowski marked this pull request as ready for review March 12, 2026 13:27
Copilot AI review requested due to automatic review settings March 12, 2026 13:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the LLM/VLM servable initialization flow to allow overriding the tokenizer chat template from a chat_template.jinja file located in the model path, making that override available across multiple pipeline initializers.

Changes:

  • Add logic to detect and read chat_template.jinja from the model path and call tokenizer.set_chat_template(...).
  • Add <fstream> include where needed to support reading the template file.
  • Apply the same override behavior across legacy LM, continuous batching LM, and legacy VLM initializers.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
src/llm/visual_language_model/legacy/servable_initializer.cpp Reads chat_template.jinja (if present) after creating the VLM pipeline/tokenizer and applies it to the tokenizer.
src/llm/language_model/legacy/servable_initializer.cpp Reads chat_template.jinja (if present) after creating the legacy LLM pipeline/tokenizer and applies it to the tokenizer.
src/llm/language_model/continuous_batching/servable_initializer.cpp Reads chat_template.jinja (if present) after creating the CB pipeline/tokenizer and applies it to the tokenizer.

properties->tokenizer = properties->pipeline->get_tokenizer();

// Override chat template from chat_template.jinja file if present in model directory
std::filesystem::path chatTemplateJinjaPath = std::filesystem::path(parsedModelsPath) / "chat_template.jinja";
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chatTemplateJinjaPath is derived as parsedModelsPath / "chat_template.jinja". Since parseModelsPath() accepts .gguf files as a valid models_path, this won’t locate a template stored next to a GGUF file (it will incorrectly search under <file>.gguf/chat_template.jinja). Consider basing the search directory on parsedModelsPath if it’s a directory, otherwise use parent_path().

Suggested change
std::filesystem::path chatTemplateJinjaPath = std::filesystem::path(parsedModelsPath) / "chat_template.jinja";
std::filesystem::path modelsPathFs(parsedModelsPath);
std::filesystem::path chatTemplateDir = modelsPathFs;
if (!chatTemplateDir.empty() && !std::filesystem::is_directory(chatTemplateDir)) {
chatTemplateDir = chatTemplateDir.parent_path();
}
std::filesystem::path chatTemplateJinjaPath = chatTemplateDir / "chat_template.jinja";

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atobiszei do gguf models have chat_template,jinja file next to model files, or is the chat template built in?

std::istreambuf_iterator<char>());
if (!chatTemplateContent.empty()) {
properties->tokenizer.set_chat_template(chatTemplateContent);
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Overriding chat template from: {}", chatTemplateJinjaPath.string());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Overriding chat template from: {}", chatTemplateJinjaPath.string());
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Using the chat template from: {}", chatTemplateJinjaPath.string());

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Overriding chat template from: {}", chatTemplateJinjaPath.string());
}
} else {
SPDLOG_LOGGER_WARN(llm_calculator_logger, "Failed to open chat template file: {}", chatTemplateJinjaPath.string());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SPDLOG_LOGGER_WARN(llm_calculator_logger, "Failed to open chat template file: {}", chatTemplateJinjaPath.string());
SPDLOG_LOGGER_ERROR(llm_calculator_logger, "Failed to open chat template file: {}", chatTemplateJinjaPath.string());

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{%- endfor %}
{%- if add_generation_prompt %}
{#- Originally '<|im_start|>assistant\n<think>\n' #}
{{- '<|im_start|>assistant\n' }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to turn off thinking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this chat template doesnt support that originally. Looks like not a common thing. Is it already part of the process, that we always add support for that whenever we introduce new thinking model? @dtrawins

@dkalinowski dkalinowski changed the title Chat template from chat_template.jinja for all possible paths Chat template from chat_template.jinja for all possible paths + custom chat template for Qwen3-VL thinking Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants