Skip to content

Faster path for onehotbatch(::CUArray{Int}, ::UnitRange)#29

Merged
mcabbott merged 5 commits into
FluxML:mainfrom
mcabbott:gpuarrays3
Dec 31, 2022
Merged

Faster path for onehotbatch(::CUArray{Int}, ::UnitRange)#29
mcabbott merged 5 commits into
FluxML:mainfrom
mcabbott:gpuarrays3

Conversation

@mcabbott
Copy link
Copy Markdown
Member

Follow-up to #27, moves the inbounds check:

julia> inds100 = rand(1:100, 100); inds100cu = inds100 |> cu;

julia> @btime onehotbatch($inds100, 1:100);
  348.811 ns (1 allocation: 496 bytes)

julia> @btime onehotbatch($inds100cu, 1:100);  # with #27, minimum + maximum sync twice
  70.267 μs (86 allocations: 4.02 KiB)

julia> @btime unsafe_onehotbatch($inds100, 1:100);
  231.916 ns (1 allocation: 496 bytes)

julia> @btime unsafe_onehotbatch($inds100cu, 1:100);  # version with no checks
  8.889 μs (28 allocations: 1.11 KiB)

julia> function fused_onehotbatch(data::AbstractArray{<:Integer}, labels::AbstractUnitRange{<:Integer})
         offset = 1 - first(labels)
         indices = map(data) do datum
                    i = UInt32(datum + offset)
                    checkbounds(labels, i)  # like this PR
                    i
                  end
         return OneHotArray(indices, length(labels))
       end
fused_onehotbatch (generic function with 1 method)

julia> @btime fused_onehotbatch($inds100cu, 1:100);
  10.708 μs (31 allocations: 1.20 KiB)

julia> bad100 = copy(inds100); bad100[33] = 101; bad100cu = bad100 |> cu;

julia> fused_onehotbatch(bad100cu, 1:100)
ERROR: Out-of-bounds array access.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: KernelException: exception thrown during kernel execution on device Tesla V100-PCIE-16GB
Stacktrace:
 [1] check_exceptions()
   @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/exceptions.jl:34
 [2] nonblocking_synchronize
   @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/context.jl:331 [inlined]
 [3] device_synchronize()
   @ CUDA ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/context.jl:319

julia> cu(ones(100))[bad100cu]  # getindex does much the same check, inside the kernel
ERROR: Out-of-bounds array access.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: KernelException: exception thrown during kernel execution on device Tesla V100-PCIE-16GB
Stacktrace:
 [1] check_exceptions()
   @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/exceptions.jl:34
 [2] nonblocking_synchronize
   @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/context.jl:331 [inlined]
 [3] device_synchronize()
   @ CUDA ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/context.jl:319

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 31, 2022

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.68%. Comparing base (32e06c8) to head (49561f9).
⚠️ Report is 21 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #29      +/-   ##
==========================================
- Coverage   96.21%   95.68%   -0.53%     
==========================================
  Files           4        4              
  Lines         132      139       +7     
==========================================
+ Hits          127      133       +6     
- Misses          5        6       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/onehot.jl Outdated
Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
Comment thread src/onehot.jl Outdated
@mcabbott mcabbott merged commit 8f447ff into FluxML:main Dec 31, 2022
@mcabbott mcabbott deleted the gpuarrays3 branch December 31, 2022 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants