mtmd : add post-decode callback#24645
Conversation
Assisted-by: pi:llama.cpp/Qwen3.6-27B
|
Just to confirm, apart from avoid including If so, I agree that the callback added here is acceptable |
Yes, I think that as long as we process the exact same batches both with the target and draft/spec contexts, they should remain synchronized. There is still some incorrectness when doing multi-modal processing with MTP, but to fix that we have to rework the |
ok I will start working on this today (unless you want to take over it) |
I plan to do a few refactors that have piled-up lately (Metal, memory, tests, ggml backend), but wasn't planning to start on My general idea for |
Overview
alt #24520
This resolves the
[TAG_MTMD_DRAFT_PROCESSING]TODO for synchronizing the target and draft contexts and avoid includingllama-ext.hinserver-context.cpp.Requirements