Fast path `onehotbatch(::Vector{Int}, ::UnitRange)` by mcabbott · Pull Request #27 · FluxML/OneHotArrays.jl

mcabbott · 2022-12-26T20:13:40Z

This adds the obvious shortcut when the data is already indicies. It's a bit quicker, but also a partial solution to #16, as this will work with GPU arrays too.

julia> let x = rand(0:99, 100)
         @btime onehotbatch($x, 0:99)
       end;
  min 231.052 ns, mean 245.482 ns (1 allocation, 496 bytes)
  min 97.912 ns, mean 106.604 ns (1 allocation, 496 bytes)  # after

~~Needs tests, and probably an error check.~~ Done.

codecov-commenter · 2022-12-26T20:17:45Z

Codecov Report

Base: 95.96% // Head: 96.21% // Increases project coverage by +0.24% 🎉

Coverage data is based on head (7c1238f) compared to base (d27d037).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #27      +/-   ##
==========================================
+ Coverage   95.96%   96.21%   +0.24%     
==========================================
  Files           3        4       +1     
  Lines         124      132       +8     
==========================================
+ Hits          119      127       +8     
  Misses          5        5

Impacted Files	Coverage Δ
src/onehot.jl	`96.15% <100.00%> (+0.59%)`	⬆️
src/OneHotArrays.jl	`100.00% <0.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

mcabbott · 2022-12-28T12:20:55Z

+function onehotbatch(data::AbstractArray{<:Integer}, labels::AbstractUnitRange{<:Integer})
+  # lo, hi = extrema(data)  # fails on Julia 1.6
+  lo, hi = minimum(data), maximum(data)
+  lo < first(labels) && error("Value $lo not found in labels")
+  hi > last(labels) && error("Value $hi not found in labels")
+  offset = 1 - first(labels)
+  indices = UInt32.(data .+ offset)
+  return OneHotArray(indices, length(labels))


Unfortunately the bounds checking here is quite expensive, especially on GPU arrays where I think each of minimum & maximum forces synchronisation:

julia> let ci = cu(rand(1:99, 100)) @btime CUDA.@sync onehotbatch($ci, 1:99) @btime CUDA.@sync OneHotMatrix($ci, 99) end; 100.993 μs (86 allocations: 4.02 KiB) 2.803 μs (0 allocations: 0 bytes) julia> let ci = cu(rand(1:99, 100)) @btime CUDA.@sync maximum($ci), minimum($ci) @btime CUDA.@sync extrema($ci) @btime CUDA.@sync map($ci) do i 0<i<100 || error("bad index") UInt32(i+0) end end; 71.448 μs (58 allocations: 2.91 KiB) 38.094 μs (29 allocations: 1.47 KiB) 18.543 μs (30 allocations: 1.14 KiB) julia> let ci = cu(rand(1:99, 100)) # without explicit CUDA.@sync @btime extrema($ci) @btime OneHotMatrix($ci, 99) # async, which is good @btime OneHotMatrix(map($ci) do i # unfortunately not? 0<i<100 || error("bad index") UInt32(i+0) end, 99) end; 35.544 μs (29 allocations: 1.47 KiB) 6.527 ns (0 allocations: 0 bytes) 10.619 μs (30 allocations: 1.14 KiB)

Moving the check inside the broadcast is faster, at the cost of more obscure errors. Maybe that's ok? Still not fully async.

julia> map(cu(rand(1:199, 100))) do i 0<i<100 || error("bad index") UInt32(i+0) end ERROR: a exception was thrown during kernel execution. Run Julia on debug level 2 for device stack traces. ERROR: a exception was thrown during kernel execution. Run Julia on debug level 2 for device stack traces.

it is rather obscure indeed. What if we wrap the map inside a try-catch and raise a proper error?

I wonder if we can tell the GPU not to wait? This doesn't work but perhaps something similar does:

julia> let i = rand(1:99, 100) @btime maximum($i)<100 || error("outside") @btime @async maximum($i)<100 || error("outside") end; 58.407 ns (0 allocations: 0 bytes) 759.747 ns (5 allocations: 496 bytes) julia> let ci = cu(rand(1:99, 100)) @btime maximum($ci)<100 || error("outside") @btime @async maximum($ci)<100 || error("outside") end; 35.134 μs (29 allocations: 1.45 KiB) # hangs?

I think the ideal solution would be something like JuliaGPU/CUDA.jl#1140. If we had a way to write kernels, another idea would be to create an ad-hoc in kernel which flips a one-element bool array to true if it finds a matching element.

That sounds like the right thing. Perhaps rather than owning a kernel, this package could call checkbounds(out, inds, 1) or whatever -- that's essentially the same operation.

I wondered what gather did, and it turns out there is no check:

julia> NNlib.gather([1,20,300,4000] |> cu, [2,4,2,99] |> cu) 4-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}: 20 4000 20 0 julia> NNlib.gather([1,20,300,4000], [2,4,2,99]) ERROR: BoundsError: attempt to access 4-element Vector{Int64} at index [99]

The PR to add one FluxML/NNlibCUDA.jl#51 has many benchmarks... perhaps also 10s of μs.

add a fast path

e007533

mcabbott added 2 commits December 26, 2022 15:25

add an error check

6c432cc

fixup, add tests

6809fd9

mcabbott mentioned this pull request Dec 27, 2022

Ambiguity in getindex, and missing == definition? #28

Open

fix 1.6

7c1238f

mcabbott requested a review from CarloLucibello December 27, 2022 19:10

ToucheSir approved these changes Dec 27, 2022

View reviewed changes

mcabbott merged commit 32e06c8 into FluxML:main Dec 27, 2022

mcabbott deleted the simple branch December 27, 2022 21:15

This was referenced Dec 28, 2022

accept integer labels in (logit)crossentropy FluxML/Flux.jl#2141

Open

onehotbatch(::CuArray, ...) moves data to host #16

Open

mcabbott commented Dec 28, 2022

View reviewed changes

mcabbott mentioned this pull request Dec 31, 2022

Faster path for onehotbatch(::CUArray{Int}, ::UnitRange) #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fast path `onehotbatch(::Vector{Int}, ::UnitRange)`#27

Fast path `onehotbatch(::Vector{Int}, ::UnitRange)`#27
mcabbott merged 4 commits into
FluxML:mainfrom
mcabbott:simple

mcabbott commented Dec 26, 2022 •

edited

Loading

Uh oh!

codecov-commenter commented Dec 26, 2022 •

edited

Loading

Uh oh!

mcabbott Dec 28, 2022 •

edited

Loading

Uh oh!

CarloLucibello Dec 28, 2022

Uh oh!

mcabbott Dec 28, 2022

Uh oh!

ToucheSir Dec 30, 2022

Uh oh!

mcabbott Dec 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

mcabbott commented Dec 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Dec 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mcabbott Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CarloLucibello Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

mcabbott Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

ToucheSir Dec 30, 2022

Choose a reason for hiding this comment

Uh oh!

mcabbott Dec 31, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mcabbott commented Dec 26, 2022 •

edited

Loading

codecov-commenter commented Dec 26, 2022 •

edited

Loading

mcabbott Dec 28, 2022 •

edited

Loading