CUDA crash during mmproj/clip image encode on discrete NVIDIA GPU

## Summary

Multimodal image describe (vision GGUF + `mmproj` projector) crashes the process
with `Lost connection to device` during the **first** native image encode
(`mtmd_helper_eval_chunks`) on a discrete NVIDIA GPU. The same code path works on
version **0.6.12** and regressed somewhere in **0.8.0 → 0.8.3** (I think? I need to test more since I moved to 0.8.x to try out MTP) (native pin
`b9587` → `b9694`). 

It is worth it to mention that MTP and llamadart chat sessions seem to work wonderfully when I am just using text.

The crash is **not** caused by:

- speculative decoding / MTP (disabled, still crashes),
- `enableThinking` (forced false, still crashes),
- model flash-attention (`FlashAttention.disabled` on model load, still crashes),
- GPU selection (pinned `mainGpu` to the 4080, still crashes),
- running the projector on CPU vs GPU (CPU `use_gpu=false` **also** crashes).

So at first I thought it could be something related to FlashAttention or with CUDA, but running on CPU crashed too.

## Environment

| | |
|---|---|
| llamadart | 0.8.1 / 0.8.2 / 0.8.3 (all reproduce) |
| Known-good | 0.6.12 |
| Native pin | `leehack/llamadart-native@b9694` (0.8.2+), `b9587` (0.8.0) |
| OS | Windows 11 10.0.26200 (x64) |
| GPU | NVIDIA GeForce RTX 4080 Laptop (12 GB), CUDA backend |
| Also present (but unused) | Intel Iris Xe  |
| Backends bundled | `[cuda, vulkan]` (Windows), CPU |
| Model | Unsloth Gemma-4-E4B-It  4-bit quant, after trying 26B MoE since I originally thought it could be OOM |
| Projector | gemma4v vision + gemma4a audio mmproj|

## Reproduction

1. Load the Gemma-4 vision GGUF with `GpuBackend.auto`/`cuda`, all layers offloaded to GPU.
2. `loadMultimodalProjector(mmproj.gguf)` — succeeds; logs `clip_ctx: CLIP using CUDA0 backend`.
3. `engine.create([LlamaImageContent(path), LlamaTextContent(prompt)], enableThinking: false)`.
4. Stream the result. Process dies with `Lost connection to device` ~5s in,
   before the first token, during the native image encode.

## What I tried

| Attempt | Result |
|---|---|
| Disable MTP | no change |
| Force `enableThinking: false` for vision | no change |
| `FlashAttention.disabled` on model load | no change |
| `GpuBackend.vulkan` for the model | model on Vulkan → crash |
| Pin `mainGpu` to discrete RTX 4080 (`splitMode.none`) | correct GPU, still crash |
| Force whole vision engine to CPU (`gpuLayers:0` → `mtmd use_gpu=false`) | crash again |

## Asks for upstream

Honestly I hope that I missed something myself, so if anybody's got the time I'd like to see if anybody can replicate what I've seen, and if you can get it working (or not) in a similar environment to mine.

## Local workaround

I just pinned my pubspec to 0.6.12 for now, and commented out the MTP paths so I can hopefully re-enable them later once I can get images working. Not all that fancy, but it is what it is, I suppose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA crash during mmproj/clip image encode on discrete NVIDIA GPU #230

Summary

Environment

Reproduction

What I tried

Asks for upstream

Local workaround

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development


llamadart	0.8.1 / 0.8.2 / 0.8.3 (all reproduce)
Known-good	0.6.12
Native pin	`leehack/llamadart-native@b9694` (0.8.2+), `b9587` (0.8.0)
OS	Windows 11 10.0.26200 (x64)
GPU	NVIDIA GeForce RTX 4080 Laptop (12 GB), CUDA backend
Also present (but unused)	Intel Iris Xe
Backends bundled	`[cuda, vulkan]` (Windows), CPU
Model	Unsloth Gemma-4-E4B-It 4-bit quant, after trying 26B MoE since I originally thought it could be OOM
Projector	gemma4v vision + gemma4a audio mmproj

Attempt	Result
Disable MTP	no change
Force `enableThinking: false` for vision	no change
`FlashAttention.disabled` on model load	no change
`GpuBackend.vulkan` for the model	model on Vulkan → crash
Pin `mainGpu` to discrete RTX 4080 (`splitMode.none`)	correct GPU, still crash
Force whole vision engine to CPU (`gpuLayers:0` → `mtmd use_gpu=false`)	crash again

CUDA crash during mmproj/clip image encode on discrete NVIDIA GPU #230

Description

Summary

Environment

Reproduction

What I tried

Asks for upstream

Local workaround

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions