feat(server): instruction + chat-template wrappers for embedding inputs by marksverdhei · Pull Request #130 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-06-16T14:41:11Z

Problem

Instruction-tuned embedding models (e.g. LCO-Embedding-Omni / Qwen2.5-Omni) require their inputs wrapped in a task instruction + the chat template, then last-token pooled. The /embedding path tokenizes raw content, so the trained embedding position is never hit and retrieval — especially cross-modal text↔image — collapses to near-random. (This is the follow-up to #126, which stopped images being dropped but left retrieval quality poor.)

Measured on LCO-Embedding-Omni-3B-2605 (last-token, normalize): raw input → cross-modal margin +0.002 (≈random); the correct recipe → +0.30, correct ranking.

Change

Two opt-in, per-modality wrapper flags (empty = off = current raw behavior):

--embd-prompt-text "…{content}…" — text inputs; {content} → request text.
--embd-prompt-image "…{media}…" — image inputs; {media} → the server's media marker (one per image), request content ignored.

{media} substitution solves the random-per-restart marker problem, so clients send raw content unchanged (zero client change). \n/\t escapes are processed (string_process_escapes) so values survive INI/preset quoting. Modality is inferred from the request (image_data field / OAI image_url parts). The recipe is symmetric (query == document) so no role field is needed.

Validation

Built + run with the flags via a preset-style literal-\n CLI (proves escape resolution). Sending raw requests, the server applies the recipe:

text same−cross margin: +0.21 (raw) → +0.30
cross-modal q="red circle": red 0.65 vs blue 0.35, +0.30 RIGHT (raw: +0.002)
output L2-normalized ✓

Deployed via the omniswap preset's per-model block (LCO only; qwen3-embed / embeddinggemma stay raw). Follows #126 / #125 / #127.

Instruction-tuned embedding models (e.g. LCO-Embedding-Omni / Qwen2.5-Omni) need their inputs wrapped in a task instruction + the chat template, then last-token pooled. Without it the trained embedding position is never hit and retrieval — especially cross-modal text<->image — collapses to near-random. The /embedding path tokenized raw content, so these models served poor embeddings even though images were no longer dropped. Add two opt-in, per-modality wrapper flags: --embd-prompt-text "...{content}..." (text inputs) --embd-prompt-image "...{media}..." (image inputs; request content ignored) {content} -> request text; {media} -> the server's media marker (one per image), which solves the random-per-restart marker problem so clients send raw content unchanged. \n/\t escapes are processed (string_process_escapes) so values survive INI/preset quoting. Empty (default) = raw input, no change. Modality is inferred from the request (image_data field / OAI image_url parts). The recipe is symmetric (query == document) so no role field is needed. Validated on LCO-Embedding-Omni-3B-2605 (last-token pooling, normalize): raw input gave cross-modal margin +0.002 (near-random); with the wrappers, text same-vs-cross margin +0.30 and cross-modal +0.30 with correct ranking.

marksverdhei merged commit 64890ea into ht Jun 16, 2026
20 of 23 checks passed

marksverdhei deleted the feat/embd-instruction-prompt branch June 16, 2026 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): instruction + chat-template wrappers for embedding inputs#130

feat(server): instruction + chat-template wrappers for embedding inputs#130
marksverdhei merged 1 commit into
htfrom
feat/embd-instruction-prompt

marksverdhei commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marksverdhei commented Jun 16, 2026

Problem

Change

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant