-
Notifications
You must be signed in to change notification settings - Fork 194
Description
Summary
With concurrent-kernel enabled (-gpgpu_concurrent_kernel_sm 1), queued kernels in kernels_info are incorrectly deleted when backend activity drops to idle. This causes missing kernel launches and non-monotonic launch UID patterns.
Dumped from log
Header info loaded for kernel command : /*/kernel-128-ctx_0x21440c40.traceg.xz (`-gpgpu_max_concurrent_kernel 128` window size is 128)
launching kernel name: _XXX_ uid: 1 cuda_stream_id: 140669595547776
Destroy streams for kernel 1: size 0
Destroy streams for kernel 3: size 0
Destroy streams for kernel 5: size 0
Destroy streams for kernel 7: size 0
...
Destroy streams for kernel 121: size 0
Destroy streams for kernel 123: size 0
Destroy streams for kernel 125: size 0
Destroy streams for kernel 127: size 0
...
launching kernel name: _XXX_ uid: 2 cuda_stream_id: 140669595547776
Destroy streams for kernel 2: size 0
Destroy streams for kernel 6: size 0
Destroy streams for kernel 10: size 0
...
Destroy streams for kernel 187: size 0
Destroy streams for kernel 189: size 0
Destroy streams for kernel 191: size 0
...
launching kernel name: _XX_ uid: 4 cuda_stream_id: 140669595547776
(and this delete one skip one is another bug, but if the incorrect deletion issue is resolved, this delete one skip one bug would no longer be triggered)
Affected configuration
Reproduces when all of the following are true:
-gpgpu_concurrent_kernel_sm 1-gpgpu_max_concurrent_kernel > 1(e.g., 128, 1024)
Observed behavior
- Many kernel headers are parsed, but far fewer kernels launch.
- I have N kernels from traces. I noticed that all N kernel headers were parsed. However only around N/6 were launched. This is how I noticed this bug.
Destroy streams for kernel ... size 0appears for many queued kernels that were never launched (triggered in clean up inaccel-sim.cc- Launch UIDs show skip/stride behavior (e.g., odd/even survival pattern).
- Stream concurrency is under-realized because queued kernels are dropped before launch.
Expected behavior
- Cleanup removes only finished kernels and not kernels queued by the concurrency flag
Root cause
The cause and fix are both very simple.
The cause is the!m_gpgpu_sim->active() flag in the if-condition. This becomes true after a kernel finishes launching, so the clean up destroys all kernels in the queue (instead of launching them).
The fix would be to add a extra flag in the if-condition. Instead of !m_gpgpu_sim->active(), make it (!concurrent_kernel_sm && !m_gpgpu_sim->active()) so this behavior only triggers when concurrent mode is off. This fix keeps the behaviour unchanged for non-concurrent mode (to keep disruptions minimal) and fixes the behavior under concurrent mode (validated).
void accel_sim_framework::simulation_loop() {
...
unsigned finished_kernel_uid = simulate();
// cleanup finished kernel
if (finished_kernel_uid || m_gpgpu_sim->cycle_insn_cta_max_hit() ||
!m_gpgpu_sim->active()) {
cleanup(finished_kernel_uid);
}
...
}
void accel_sim_framework::cleanup(unsigned finished_kernel) {
...
if (k->get_uid() == finished_kernel ||
m_gpgpu_sim->cycle_insn_cta_max_hit() || !m_gpgpu_sim->active()) {
...
}
...
}