Skip to content
This repository was archived by the owner on Jan 26, 2024. It is now read-only.
This repository was archived by the owner on Jan 26, 2024. It is now read-only.

GPU fault detected in enqueue_kernel #149

@kazuki

Description

@kazuki

Environment

  1. ThinkPad X13 Ryzen 7 6850U, Gentoo Linux, Linux 5.18.16/5.19.0 ROCm 5.0.2
  2. ThinkPad X13 Ryzen 7 6850U, Gentoo Linux, Linux 5.19.0 + docker rocm-terminal ROCm 5.2
  3. Threadripper 3970X + Radron RX560, 5.18.14/5.19.0, Gentoo LInux, ROCm 5.0.2

Code

__kernel void sub() {
}

__kernel void test() {
  enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_NO_WAIT, ndrange_1D(1), ^{
    sub();
  });  
}
import pyopencl as cl

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
with open("test.cl", 'r', encoding='utf8') as f:
    code = f.read()
prog = cl.Program(ctx, code).build(options="-cl-std=CL2.0 -save-temps")
prog.test(queue, [1], None)

Launch test kernel by enqueueNDRange, always SEGV raised in userspace application.

[ 8644.417555] Command Queue T[206540]: segfault at 18 ip 00007fe88abccae4 sp 00007fe87d7c1a90 error 4 in libamdocl64.so[7fe88ab10000+13f000]
[ 8644.417566] Code: 5c 41 5d 41 5e c3 48 8d 0d b1 a1 08 00 ba 53 01 00 00 48 8d 35 a5 9e 08 00 48 8d 3d d6 a1 08 00 e8 51 3b f4 ff 90 53 48 89 fb <48> 8b 7f 18 41 89 d1 48 85 ff 74 40 4c 8b 43 20 31 c9 31 c0 eb 11
[ 8644.417570] amdgpu 0000:21:00.0: amdgpu: GPU fault detected: 146 0x0000480c for process python pid 206474 thread python pid 206474
[ 8644.417576] amdgpu 0000:21:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[ 8644.417578] amdgpu 0000:21:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1004800C
[ 8644.417579] amdgpu 0000:21:00.0: amdgpu: VM fault (0x0c, vmid 8, pasid 32773) at page 0, read from 'TC0' (0x54433000) (72)
[ 8644.417586] amdgpu 0000:21:00.0: amdgpu: GPU fault detected: 146 0x0000480c for process python pid 206474 thread python pid 206474
[ 8644.417588] amdgpu 0000:21:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[ 8644.417589] amdgpu 0000:21:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x11048014
[ 8644.417590] amdgpu 0000:21:00.0: amdgpu: VM fault (0x14, vmid 8, pasid 32773) at page 0, write from 'TC0' (0x54433000) (72)

(Above code is works in NVIDIA CUDA OpenCL runtime)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions