Eval bug: std::runtime_error Invalid diff:

### Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 5519 (a6824743)
built with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

Ryzen 5 3600 + RTX 5090

### Models

Qwen3 32B q5

### Problem description & steps to reproduce

./llama-server -m ~/llm/models/Qwen3-32B-Q5_K_S.gguf -c 16384 -ngl 999 --host 0.0.0.0 --port 5000 --jinja --api-key <key>

This is how I run the program, the issue happens every so often and I can't (in the limited attempts I tried) replicate it with llama-cli

### First Bad Commit

_No response_

### Relevant log output

```shell
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: '<think>Okay, the user mentioned that Docker is taking up a lot of space and they want to delete unused volumes. Now they're saying that something else might be using all the storage and they don't know if it's Docker. I need to help them figure out what's consuming their disk space.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: std::runtime_error Invalid diff: #13876

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: std::runtime_error Invalid diff: #13876

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions