llama : suppress misleading Gemma4Assistant error during memory fitting#24590
llama : suppress misleading Gemma4Assistant error during memory fitting#24590leotm wants to merge 2 commits into
Conversation
| class llama_exception : public std::runtime_error { | ||
| using std::runtime_error::runtime_error; | ||
| }; |
There was a problem hiding this comment.
can update to a struct or move to src\llama-impl.h if preferred
| try { | ||
| auto * ctx = new llama_context(*model, params); | ||
| return ctx; |
There was a problem hiding this comment.
could add e.g. LLAMA_LOG_INFO("%s: successfully initialized the context: %s\n", __func__); if preferred
| } catch (const llama_exception & err) { | ||
| LLAMA_LOG_WARN("%s: failed to initialize the context: %s\n", __func__, err.what()); | ||
| } catch (const std::exception & err) { |
There was a problem hiding this comment.
can rename err vars to e if preferred (seems the more common convention)
|
This is indeed misleading; this warning should not be normal during memory fitting - it hides a bug (fitting is broken) |
ggerganov
left a comment
There was a problem hiding this comment.
Also throw in the EAGLE3 case below.
We can't fit before loading the target model because the assistants and eagles require a target model to be already loaded. |
|
It sounds like we can stub them with |
updated ^ |
Overview
Thank you @am17an for Gemma4 MTP support ^ it introduced
llama.cpp/src/llama-context.cpp
Lines 93 to 101 in 4988f6e
which sent me down the wrong track earlier (my fault)
and was noted in couple follow-ups
and couple issues
so i addressed the 2nd TODO only (for minimality)
if this looks as intended, happy to address the 1st TODO post-merge
(RE both LLM_ARCH_GEMMA4_ASSISTANT and LLM_ARCH_EAGLE3)
Fix: #24343
Fix: #24350
Additional information
I've tested on
Before
After
NB: i noted shedrachokonofua/aether@b0d4bca but idk what it's doing and the mentioned #24376 is unrelated
P.S: i've not touched C++ in a while
Requirements