Is your feature request related to a problem? Please describe.
In multi-modal deployment for accuracy evaluation it was noticed that there are some inconsistencies between LLMs and VLMs deployment in pytriton. The biggest different is that for LLMs the chat template is applied on the server side (here), while for VLMs there's no such method and everything need to happen on the client side (here). Would it be possible to move this processing to the server side for VLMs too? Without this we don't have OpenAI-like api and it cannot be used the server for evaluation.
Solution:
The operations like chat template should be moved to the server side.
Is your feature request related to a problem? Please describe.
In multi-modal deployment for accuracy evaluation it was noticed that there are some inconsistencies between LLMs and VLMs deployment in pytriton. The biggest different is that for LLMs the chat template is applied on the server side (here), while for VLMs there's no such method and everything need to happen on the client side (here). Would it be possible to move this processing to the server side for VLMs too? Without this we don't have OpenAI-like api and it cannot be used the server for evaluation.
Solution:
The operations like chat template should be moved to the server side.