Skip to content

Move operations like chat templates to the server side for VLM deployment #423

@oyilmaz-nvidia

Description

@oyilmaz-nvidia

Is your feature request related to a problem? Please describe.
In multi-modal deployment for accuracy evaluation it was noticed that there are some inconsistencies between LLMs and VLMs deployment in pytriton. The biggest different is that for LLMs the chat template is applied on the server side (here), while for VLMs there's no such method and everything need to happen on the client side (here). Would it be possible to move this processing to the server side for VLMs too? Without this we don't have OpenAI-like api and it cannot be used the server for evaluation.

Solution:
The operations like chat template should be moved to the server side.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions