Gemma4 Assistant Draft Model Support

koboldcpp fails to load draft models for Gemma4
```
Attempting to load draft model for speculative decoding. It will be fully offloaded if possible. Vocab must match the main model.
llama_model_loader: loaded meta data with 50 key-value pairs and 49 tensors from C:/Users/kat/AI/Models/Gemma4/gemma-4-26b-A4B-it-assistant-QAT-Q4_0.gguf (version GGUF V3 (latest))
print_info: file format = GGUF V3 (latest)
print_info: file size   = 291.21 MiB (5.82 BPW)
llama_model_load: error loading model: unknown model architecture: 'gemma4-assistant'
llama_model_load_from_file_impl: failed to load model
llama_init_from_model: model cannot be NULL
Error: failed to load speculative decoding draft model 'C:/Users/kat/AI/Models/Gemma4/gemma-4-26b-A4B-it-assistant-QAT-Q4_0.gguf'
Speculative Decoding will not be used!
Starting model warm up, please wait a moment...
Load Text Model OK: True
```

**Additional Information:**

OS: Windows 11
CPU: ryzen 2700x
gpu: nvidia rtx 4060ti
KoboldCPP version: 1.114.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma4 Assistant Draft Model Support #2262

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Gemma4 Assistant Draft Model Support #2262

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions