Combine DMA loads

When we compile following OpenCL code which calls `vload16` three times with `vc4c --asm -O3 -o dma_loads.asm dma_loads.cl`, VC4C outputs the following assembly(`dma_loads.txt`). This contains three DMA loads, but these can be combined into one DMA load.

```opencl
__kernel void dma_loads(int width, int height, __global uchar *in, __global uchar *out)
{
    for (int y = 1; y < height - 1; y++) {
        size_t idx = y * width;
        uchar16 up   = vload16(idx - width, in);
        uchar16 center = vload16(idx, in);
        uchar16 down = vload16(idx + width, in);

        uchar16 r = (
            up                                               / (uchar16)(3) +
            center                                           / (uchar16)(3) +
            down                                             / (uchar16)(3));

        vstore16(r, idx, out);
    }
}
```

[dma_loads.txt](https://github.com/doe300/VC4C/files/4563230/dma_loads.txt)

I want to implement the combiner and think the method.

At each block in CFG and LLVM IR

```
; Function Attrs: convergent nounwind
define spir_kernel void @dma_loads(i32 %width, i32 %height, i8 addrspace(1)* %in, i8 addrspace(1)* %out) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 !kernel_arg_name !7 {
  %sub = add nsw i32 %height, -1
  %cmp23 = icmp sgt i32 %height, 2
  br i1 %cmp23, label %.lr.ph.preheader, label %._crit_edge

.lr.ph.preheader:                                 ; preds = %0
  br label %.lr.ph

._crit_edge:                                      ; preds = %.lr.ph, %0
  ret void

.lr.ph:                                           ; preds = %.lr.ph.preheader, %.lr.ph
  %y.024 = phi i32 [ %inc, %.lr.ph ], [ 1, %.lr.ph.preheader ]
  %mul = mul nsw i32 %y.024, %width
  %sub1 = sub i32 %mul, %width
  %call = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %sub1, i8 addrspace(1)* %in) #2
  %call2 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %mul, i8 addrspace(1)* %in) #2
  %add = add i32 %mul, %width
  %call3 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %add, i8 addrspace(1)* %in) #2
  %div = udiv <16 x i8> %call, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
  %div4 = udiv <16 x i8> %call2, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
  %add5 = add nuw <16 x i8> %div4, %div
  %div6 = udiv <16 x i8> %call3, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
  %add7 = add <16 x i8> %add5, %div6
  tail call spir_func void @_Z8vstore16Dv16_hjPU3AS1h(<16 x i8> %add7, i32 %mul, i8 addrspace(1)* %out) #2
  %inc = add nuw nsw i32 %y.024, 1
  %cmp = icmp slt i32 %inc, %sub
  br i1 %cmp, label %.lr.ph, label %._crit_edge
}
```

1. Collect `vload16`(actually `_Z7vload16jPU3AS1Kh`).
2. Collect DMA load addresses from 1st argument of `vload16`.
3. Check whether load addresses are regular intervals. 
4. If true, combine theses loads.

I think the checking regular intervals is challenging. The symbolic execution can be used. 

## Example

Collect `vload16` (and address variables)
```
%mul = mul nsw i32 %y.024, %width
%sub1 = sub i32 %mul, %width
%add = add i32 %mul, %width

%call = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %sub1, i8 addrspace(1)* %in) #2
%call2 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %mul, i8 addrspace(1)* %in) #2
%call3 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %add, i8 addrspace(1)* %in) #2
```

Addresses
1. `%mul - %width`
2. `%mul`
3. `%mul + %width`

These are regular intervals (`%width`), then these are combined (I should create new function `dma_load` and `vpm_load`).
```
dma_load(i32 %x.093, i8 addrspace(1)* %in, 3 /*= rows*/, 16/*= columns*/)
%call = vpm_load
%call2 = vpm_load
%call3 = vpm_load
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine DMA loads #144

Example

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Combine DMA loads #144

Description

Example

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions