Skip to content

mapreducedim! is super slow #352

@simeonschaub

Description

@simeonschaub

Reductions on CLArrays seem to be almost 100x slower than Base (This is with the pocl CPU backend):

julia> using OpenCL, pocl_jll

julia> X = rand(Float32, 1000, 1000);

julia> X′ = CLArray(X);

julia> @benchmark sum(X; dims = 1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):   58.380 μs … 922.617 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):      90.710 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   103.658 μs ±  35.143 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▄▃▁ ▃▇█▃                                                 
  ▁▁▁▂▄█████████▆▆▆▅▅▄▄▃▂▃▃▃▃▄▃▃▂▂▂▂▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ ▃
  58.4 μs          Histogram: frequency by time          212 μs <

 Memory estimate: 4.02 KiB, allocs estimate: 3.

julia> @benchmark OpenCL.synchronize(sum(X′; dims = 1))
BenchmarkTools.Trial: 653 samples with 1 evaluation per sample.
 Range (min … max):  5.585 ms … 12.908 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.424 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.604 ms ±  1.126 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▁▁▁▂▃▁▄█ ▁▇▇▃▄▄▄▃▄▂▃▂▃▁▁▃▁                             
  ▃▂▃▅▆██████████████████████████▇▆█▇▄▇▇█▆▅▇▄▁▄▅▄▄▃▃▅▄▃▂▂▂▂▃ ▅
  5.59 ms        Histogram: frequency by time        10.6 ms <

 Memory estimate: 22.52 KiB, allocs estimate: 247.

Is there any low-hanging fruit in terms of optimizations here?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions