Skip to content

Combine DMA loads #144

@long-long-float

Description

@long-long-float

When we compile following OpenCL code which calls vload16 three times with vc4c --asm -O3 -o dma_loads.asm dma_loads.cl, VC4C outputs the following assembly(dma_loads.txt). This contains three DMA loads, but these can be combined into one DMA load.

__kernel void dma_loads(int width, int height, __global uchar *in, __global uchar *out)
{
    for (int y = 1; y < height - 1; y++) {
        size_t idx = y * width;
        uchar16 up   = vload16(idx - width, in);
        uchar16 center = vload16(idx, in);
        uchar16 down = vload16(idx + width, in);

        uchar16 r = (
            up                                               / (uchar16)(3) +
            center                                           / (uchar16)(3) +
            down                                             / (uchar16)(3));

        vstore16(r, idx, out);
    }
}

dma_loads.txt

I want to implement the combiner and think the method.

At each block in CFG and LLVM IR

; Function Attrs: convergent nounwind
define spir_kernel void @dma_loads(i32 %width, i32 %height, i8 addrspace(1)* %in, i8 addrspace(1)* %out) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 !kernel_arg_name !7 {
  %sub = add nsw i32 %height, -1
  %cmp23 = icmp sgt i32 %height, 2
  br i1 %cmp23, label %.lr.ph.preheader, label %._crit_edge

.lr.ph.preheader:                                 ; preds = %0
  br label %.lr.ph

._crit_edge:                                      ; preds = %.lr.ph, %0
  ret void

.lr.ph:                                           ; preds = %.lr.ph.preheader, %.lr.ph
  %y.024 = phi i32 [ %inc, %.lr.ph ], [ 1, %.lr.ph.preheader ]
  %mul = mul nsw i32 %y.024, %width
  %sub1 = sub i32 %mul, %width
  %call = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %sub1, i8 addrspace(1)* %in) #2
  %call2 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %mul, i8 addrspace(1)* %in) #2
  %add = add i32 %mul, %width
  %call3 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %add, i8 addrspace(1)* %in) #2
  %div = udiv <16 x i8> %call, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
  %div4 = udiv <16 x i8> %call2, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
  %add5 = add nuw <16 x i8> %div4, %div
  %div6 = udiv <16 x i8> %call3, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
  %add7 = add <16 x i8> %add5, %div6
  tail call spir_func void @_Z8vstore16Dv16_hjPU3AS1h(<16 x i8> %add7, i32 %mul, i8 addrspace(1)* %out) #2
  %inc = add nuw nsw i32 %y.024, 1
  %cmp = icmp slt i32 %inc, %sub
  br i1 %cmp, label %.lr.ph, label %._crit_edge
}
  1. Collect vload16(actually _Z7vload16jPU3AS1Kh).
  2. Collect DMA load addresses from 1st argument of vload16.
  3. Check whether load addresses are regular intervals.
  4. If true, combine theses loads.

I think the checking regular intervals is challenging. The symbolic execution can be used.

Example

Collect vload16 (and address variables)

%mul = mul nsw i32 %y.024, %width
%sub1 = sub i32 %mul, %width
%add = add i32 %mul, %width

%call = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %sub1, i8 addrspace(1)* %in) #2
%call2 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %mul, i8 addrspace(1)* %in) #2
%call3 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %add, i8 addrspace(1)* %in) #2

Addresses

  1. %mul - %width
  2. %mul
  3. %mul + %width

These are regular intervals (%width), then these are combined (I should create new function dma_load and vpm_load).

dma_load(i32 %x.093, i8 addrspace(1)* %in, 3 /*= rows*/, 16/*= columns*/)
%call = vpm_load
%call2 = vpm_load
%call3 = vpm_load

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions