Skip to content

llama : suppress misleading Gemma4Assistant error during memory fitting#24590

Open
leotm wants to merge 2 commits into
ggml-org:masterfrom
leotm:fix-gemma4-assistant-memory-fitting-error
Open

llama : suppress misleading Gemma4Assistant error during memory fitting#24590
leotm wants to merge 2 commits into
ggml-org:masterfrom
leotm:fix-gemma4-assistant-memory-fitting-error

Conversation

@leotm

@leotm leotm commented Jun 13, 2026

Copy link
Copy Markdown

Overview

Thank you @am17an for Gemma4 MTP support ^ it introduced

// TODO: more generic
if (model.arch == LLM_ARCH_GEMMA4_ASSISTANT) {
if (params.ctx_other == nullptr) {
// TODO: change from runtime_error to llama_exception to avoid printing error message
throw std::runtime_error("Gemma4Assistant requires ctx_other to be set (this warning is normal during memory fitting)");
}
cparams.ctx_other = params.ctx_other;
}

which sent me down the wrong track earlier (my fault)

and was noted in couple follow-ups

and couple issues

so i addressed the 2nd TODO only (for minimality)

if this looks as intended, happy to address the 1st TODO post-merge
(RE both LLM_ARCH_GEMMA4_ASSISTANT and LLM_ARCH_EAGLE3)

Fix: #24343
Fix: #24350

Additional information

I've tested on

Before

image

After

image

NB: i noted shedrachokonofua/aether@b0d4bca but idk what it's doing and the mentioned #24376 is unrelated

P.S: i've not touched C++ in a while

Requirements

@leotm leotm requested a review from ggerganov as a code owner June 13, 2026 19:26
Comment thread src/llama-context.cpp
Comment on lines +25 to +27
class llama_exception : public std::runtime_error {
using std::runtime_error::runtime_error;
};

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can update to a struct or move to src\llama-impl.h if preferred

Comment thread src/llama-context.cpp
Comment on lines 3563 to 3565
try {
auto * ctx = new llama_context(*model, params);
return ctx;

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add e.g. LLAMA_LOG_INFO("%s: successfully initialized the context: %s\n", __func__); if preferred

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*LLAMA_LOG_DEBUG

Comment thread src/llama-context.cpp
Comment on lines +3566 to 3568
} catch (const llama_exception & err) {
LLAMA_LOG_WARN("%s: failed to initialize the context: %s\n", __func__, err.what());
} catch (const std::exception & err) {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can rename err vars to e if preferred (seems the more common convention)

@sanmai

sanmai commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

This is indeed misleading; this warning should not be normal during memory fitting - it hides a bug (fitting is broken)

@ggerganov ggerganov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also throw in the EAGLE3 case below.

@ggerganov

Copy link
Copy Markdown
Member

This is indeed misleading; this warning should not be normal during memory fitting - it hides a bug (fitting is broken)

We can't fit before loading the target model because the assistants and eagles require a target model to be already loaded.

@sanmai

sanmai commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

It sounds like we can stub them with no_alloc = true

@leotm

leotm commented Jun 18, 2026

Copy link
Copy Markdown
Author

Also throw in the EAGLE3 case below.

updated ^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Gemma4 MTP Misc. bug: E llama_init_from_model: failed to initialize the context: Gemma4Assistant

3 participants