Skip to content

Gemma4 Assistant Draft Model Support #2262

@lyrakat

Description

@lyrakat

koboldcpp fails to load draft models for Gemma4

Attempting to load draft model for speculative decoding. It will be fully offloaded if possible. Vocab must match the main model.
llama_model_loader: loaded meta data with 50 key-value pairs and 49 tensors from C:/Users/kat/AI/Models/Gemma4/gemma-4-26b-A4B-it-assistant-QAT-Q4_0.gguf (version GGUF V3 (latest))
print_info: file format = GGUF V3 (latest)
print_info: file size   = 291.21 MiB (5.82 BPW)
llama_model_load: error loading model: unknown model architecture: 'gemma4-assistant'
llama_model_load_from_file_impl: failed to load model
llama_init_from_model: model cannot be NULL
Error: failed to load speculative decoding draft model 'C:/Users/kat/AI/Models/Gemma4/gemma-4-26b-A4B-it-assistant-QAT-Q4_0.gguf'
Speculative Decoding will not be used!
Starting model warm up, please wait a moment...
Load Text Model OK: True

Additional Information:

OS: Windows 11
CPU: ryzen 2700x
gpu: nvidia rtx 4060ti
KoboldCPP version: 1.114.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions