llama : suppress misleading Gemma4Assistant error during memory fitting by leotm · Pull Request #24590 · ggml-org/llama.cpp

leotm · 2026-06-13T19:26:22Z

Overview

Thank you @am17an for Gemma4 MTP support ^ it introduced

Lines 93 to 101 in 4988f6e

    
           // TODO: more generic 
        
           if (model.arch == LLM_ARCH_GEMMA4_ASSISTANT) { 
        
               if (params.ctx_other == nullptr) { 
        
                   // TODO: change from runtime_error to llama_exception to avoid printing error message 
        
                   throw std::runtime_error("Gemma4Assistant requires ctx_other to be set (this warning is normal during memory fitting)"); 
        
               } 
        
               cparams.ctx_other = params.ctx_other; 
        
           }

which sent me down the wrong track earlier (my fault)

Fix Gemma 4 MTP on llama-server (Windows) #24480

and was noted in couple follow-ups

and couple issues

so i addressed the 2nd TODO only (for minimality)

if this looks as intended, happy to address the 1st TODO post-merge
(RE both LLM_ARCH_GEMMA4_ASSISTANT and LLM_ARCH_EAGLE3)

Fix: #24343
Fix: #24350

Additional information

I've tested on

Windows 11
unsloth/gemma-4-31B-it-qat-GGUF
local build (unsloth.ai/docs/models/gemma-4/qat#llama.cpp-guide adapted for Win11 CMD, yup - old skool)

Before

After

NB: i noted shedrachokonofua/aether@b0d4bca but idk what it's doing and the mentioned #24376 is unrelated

P.S: i've not touched C++ in a while

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

Fix: ggml-org#24343 Fix: ggml-org#24350

leotm · 2026-06-13T19:29:23Z

+class llama_exception : public std::runtime_error {
+    using std::runtime_error::runtime_error;
+};


can update to a struct or move to src\llama-impl.h if preferred

leotm · 2026-06-13T19:35:48Z

    try {
        auto * ctx = new llama_context(*model, params);
        return ctx;


could add e.g. LLAMA_LOG_INFO("%s: successfully initialized the context: %s\n", __func__); if preferred

*LLAMA_LOG_DEBUG

leotm · 2026-06-13T19:47:02Z

+    } catch (const llama_exception & err) {
+        LLAMA_LOG_WARN("%s: failed to initialize the context: %s\n", __func__, err.what());
    } catch (const std::exception & err) {


can rename err vars to e if preferred (seems the more common convention)

sanmai · 2026-06-18T07:58:49Z

This is indeed misleading; this warning should not be normal during memory fitting - it hides a bug (fitting is broken)

ggerganov

Also throw in the EAGLE3 case below.

ggerganov · 2026-06-18T08:12:34Z

This is indeed misleading; this warning should not be normal during memory fitting - it hides a bug (fitting is broken)

We can't fit before loading the target model because the assistants and eagles require a target model to be already loaded.

sanmai · 2026-06-18T09:32:16Z

It sounds like we can stub them with no_alloc = true

…y fitting

leotm · 2026-06-18T18:33:52Z

Also throw in the EAGLE3 case below.

updated ^

llama : suppress misleading Gemma4Assistant error during memory fitting

49138c8

Fix: ggml-org#24343 Fix: ggml-org#24350

leotm requested a review from ggerganov as a code owner June 13, 2026 19:26

leotm commented Jun 13, 2026

View reviewed changes

ggerganov approved these changes Jun 18, 2026

View reviewed changes

sanmai mentioned this pull request Jun 18, 2026

Eval bug: Gemma4 MTP is silently disabled in case of insufficient VRAM #24758

Open

fixup! llama : suppress misleading Gemma4Assistant error during memor…

f42196e

…y fitting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : suppress misleading Gemma4Assistant error during memory fitting#24590

llama : suppress misleading Gemma4Assistant error during memory fitting#24590
leotm wants to merge 2 commits into
ggml-org:masterfrom
leotm:fix-gemma4-assistant-memory-fitting-error

leotm commented Jun 13, 2026 •

edited

Loading

Uh oh!

leotm Jun 13, 2026

Uh oh!

leotm Jun 13, 2026

Uh oh!

leotm Jun 13, 2026

Uh oh!

leotm Jun 13, 2026

Uh oh!

sanmai commented Jun 18, 2026 •

edited

Loading

Uh oh!

ggerganov left a comment

Uh oh!

ggerganov commented Jun 18, 2026

Uh oh!

sanmai commented Jun 18, 2026

Uh oh!

leotm commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// TODO: more generic
	if (model.arch == LLM_ARCH_GEMMA4_ASSISTANT) {
	if (params.ctx_other == nullptr) {
	// TODO: change from runtime_error to llama_exception to avoid printing error message
	throw std::runtime_error("Gemma4Assistant requires ctx_other to be set (this warning is normal during memory fitting)");
	}

	cparams.ctx_other = params.ctx_other;
	}

Conversation

leotm commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

leotm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

leotm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

leotm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

leotm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

sanmai commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Jun 18, 2026

Uh oh!

sanmai commented Jun 18, 2026

Uh oh!

leotm commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leotm commented Jun 13, 2026 •

edited

Loading

sanmai commented Jun 18, 2026 •

edited

Loading