Skip to content

Eval bug: llama.cpp-b9568/ggml/src/ggml-cuda/fattn.cu:579: fatal error #24324

@macafeeee

Description

@macafeeee

Name and Version

./llama-server --version
version: b9568 (unknown)
built with GNU 14.2.0 for Linux x86_64

compile with -DGGML_CUDA=ON -DGGML_CUDA_USE_FLASH_ATTENTION=ON -DGGML_CUDA_FA_ALL_QUANTS=ON -DGGML_CUDA_NCCL=ON

Operating systems

Linux

GGML backends

CUDA

Hardware

Intel(R) Xeon(R) Gold 6338 X2
PCIE Nvidia V100 32G X4
Debian 13
NVIDIA-SMI 550.163.01
Driver Version: 550.163.01
CUDA Version: 12.4

Models

gemma-4-31B-it-F16 with MTP-gemma-4-31B-it-assistant-F16

Problem description & steps to reproduce

./llama-server -m /models/gemma-4-31B-it-F16.gguf --mmproj /models/mmproj-F16.gguf --spec-draft-model /models/MTP-gemma4-31B-it-assistant-F16.gguf --fit off --split-mode tensor -t 8 -fa on --no-mmap --mlock --ctx-size 131072 -ngl all -ub 2048 -b 2048 --prio 3 --host 0.0.0.0 --port 8080
After several inferences, llama.cpp crashes randomly. The detailed error message is as follows.

First Bad Commit

No response

Relevant log output

Logs

llamacpp log:
[46009] /home/llamacpp/llama.cpp-b9568/ggml/src/ggml-cuda/fattn.cu:579: fatal error
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libggml-base.so.0(+0x18665) [0x7fbfdb553665]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libggml-base.so.0(ggml_print_backtrace+0x1df) [0x7fbfdb553a3f]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libggml-base.so.0(ggml_abort+0x11e) [0x7fbfdb553bce]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libggml-cuda.so.0(_Z24ggml_cuda_flash_attn_extR25ggml_backend_cuda_contextP11ggml_tensor+0xa77) [0x7fbfd0ffcb27]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libggml-cuda.so.0(+0x24b4db) [0x7fbfd104b4db]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libggml-base.so.0(+0x46974) [0x7fbfdb581974]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x827) [0x7fbfdb570047]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libllama.so.0(_ZN13llama_context13graph_computeEP11ggml_cgraphb+0xa1) [0x7fbfda6cd2f1]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0xe4) [0x7fbfda6cff04]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0x365) [0x7fbfda6d5e55]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libllama.so.0(llama_decode+0xb) [0x7fbfda6d7a3b]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libllama-common.so.0(_ZN33common_speculative_impl_draft_mtp5draftERSt6vectorI31common_speculative_draft_paramsSaIS1_EE+0xce) [0x7fbfdac9ffde]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libllama-common.so.0(_Z24common_speculative_draftP18common_speculative+0xc9) [0x7fbfdac95ac9]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libllama-server-impl.so(_ZN19server_context_impl12update_slotsEv+0x55c) [0x7fbfdb75dddc]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libllama-server-impl.so(_ZN12server_queue10start_loopEl+0x1efa) [0x7fbfdb7f37ba]
[46009] /home/llamacpp/llama.cpp-b9568/build/bin/libllama-server-impl.so(_Z12llama_serveriPPc+0x343a) [0x7fbfdb6bd24a]
[46009] /lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7fbfdb035ca8]
[46009] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7fbfdb035d65]
[46009] /home/llamacpp/llamacpp/llama-server(+0x11b1) [0x555b2f5151b1]
2.41.647.215 E srv operator(): http client error: Failed to read connection

dmesg log:
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Warp Exception on (GPC 5, TPC 0, SM 0): Out Of Range Address
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Global Exception on (GPC 5, TPC 0, SM 0): Multiple Warp Errors
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics Exception: ESR 0x52c730=0x107000e 0x52c734=0x24 0x52c728=0x4c1eb72 0x52c72c=0x174
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Warp Exception on (GPC 5, TPC 0, SM 1): Out Of Range Address
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Global Exception on (GPC 5, TPC 0, SM 1): Multiple Warp Errors
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics Exception: ESR 0x52c7b0=0x104000e 0x52c7b4=0x24 0x52c7a8=0x4c1eb72 0x52c7ac=0x174
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Warp Exception on (GPC 5, TPC 1, SM 0): Out Of Range Address
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Global Exception on (GPC 5, TPC 1, SM 0): Multiple Warp Errors
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics Exception: ESR 0x52cf30=0x107000e 0x52cf34=0x24 0x52cf28=0x4c1eb72 0x52cf2c=0x174
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Warp Exception on (GPC 5, TPC 2, SM 0): Out Of Range Address
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Global Exception on (GPC 5, TPC 2, SM 0): Multiple Warp Errors
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics Exception: ESR 0x52d730=0x106000e 0x52d734=0x24 0x52d728=0x4c1eb72 0x52d72c=0x174
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Warp Exception on (GPC 5, TPC 2, SM 1): Out Of Range Address
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Global Exception on (GPC 5, TPC 2, SM 1): Multiple Warp Errors
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics Exception: ESR 0x52d7b0=0x105000e 0x52d7b4=0x24 0x52d7a8=0x4c1eb72 0x52d7ac=0x174
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Warp Exception on (GPC 5, TPC 3, SM 0): Out Of Range Address
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Global Exception on (GPC 5, TPC 3, SM 0): Multiple Warp Errors
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics Exception: ESR 0x52df30=0x105000e 0x52df34=0x24 0x52df28=0x4c1eb72 0x52df2c=0x174
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Warp Exception on (GPC 5, TPC 4, SM 0): Out Of Range Address
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Global Exception on (GPC 5, TPC 4, SM 0): Multiple Warp Errors
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics Exception: ESR 0x52e730=0x104000e 0x52e734=0x24 0x52e728=0x4c1eb72 0x52e72c=0x174
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Warp Exception on (GPC 5, TPC 5, SM 0): Out Of Range Address
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics SM Global Exception on (GPC 5, TPC 5, SM 0): Multiple Warp Errors
NVRM: Xid (PCI:0000:31:00): 13, pid='', name=, Graphics Exception: ESR 0x52ef30=0x106000e 0x52ef34=0x24 0x52ef28=0x4c1eb72 0x52ef2c=0x174
NVRM: Xid (PCI:0000:31:00): 13, pid=13253, name=llama-server, Graphics Exception: ChID 0008, Class 0000c3c0, Offset 00000510, Data 00419e84

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions