Skip to content

Queued kernels are incorrectly deleted when backend becomes inactive (when gpgpu_max_concurrent_kernel > 1) #542

@annali07

Description

@annali07

Summary

With concurrent-kernel enabled (-gpgpu_concurrent_kernel_sm 1), queued kernels in kernels_info are incorrectly deleted when backend activity drops to idle. This causes missing kernel launches and non-monotonic launch UID patterns.

Dumped from log

Header info loaded for kernel command : /*/kernel-128-ctx_0x21440c40.traceg.xz (`-gpgpu_max_concurrent_kernel 128` window size is 128)
launching kernel name: _XXX_ uid: 1 cuda_stream_id: 140669595547776

Destroy streams for kernel 1: size 0
Destroy streams for kernel 3: size 0
Destroy streams for kernel 5: size 0
Destroy streams for kernel 7: size 0
...
Destroy streams for kernel 121: size 0
Destroy streams for kernel 123: size 0
Destroy streams for kernel 125: size 0
Destroy streams for kernel 127: size 0

...
launching kernel name: _XXX_ uid: 2 cuda_stream_id: 140669595547776
Destroy streams for kernel 2: size 0
Destroy streams for kernel 6: size 0
Destroy streams for kernel 10: size 0
...
Destroy streams for kernel 187: size 0
Destroy streams for kernel 189: size 0
Destroy streams for kernel 191: size 0

...
launching kernel name: _XX_ uid: 4 cuda_stream_id: 140669595547776

(and this delete one skip one is another bug, but if the incorrect deletion issue is resolved, this delete one skip one bug would no longer be triggered)

Affected configuration

Reproduces when all of the following are true:

  • -gpgpu_concurrent_kernel_sm 1
  • -gpgpu_max_concurrent_kernel > 1 (e.g., 128, 1024)

Observed behavior

  • Many kernel headers are parsed, but far fewer kernels launch.
  • I have N kernels from traces. I noticed that all N kernel headers were parsed. However only around N/6 were launched. This is how I noticed this bug.
  • Destroy streams for kernel ... size 0 appears for many queued kernels that were never launched (triggered in clean up in accel-sim.cc
  • Launch UIDs show skip/stride behavior (e.g., odd/even survival pattern).
  • Stream concurrency is under-realized because queued kernels are dropped before launch.

Expected behavior

  • Cleanup removes only finished kernels and not kernels queued by the concurrency flag

Root cause

The cause and fix are both very simple.

The cause is the!m_gpgpu_sim->active() flag in the if-condition. This becomes true after a kernel finishes launching, so the clean up destroys all kernels in the queue (instead of launching them).

The fix would be to add a extra flag in the if-condition. Instead of !m_gpgpu_sim->active(), make it (!concurrent_kernel_sm && !m_gpgpu_sim->active()) so this behavior only triggers when concurrent mode is off. This fix keeps the behaviour unchanged for non-concurrent mode (to keep disruptions minimal) and fixes the behavior under concurrent mode (validated).

void accel_sim_framework::simulation_loop() {
    ...
    unsigned finished_kernel_uid = simulate();
    // cleanup finished kernel
    if (finished_kernel_uid || m_gpgpu_sim->cycle_insn_cta_max_hit() ||
        !m_gpgpu_sim->active()) {
      cleanup(finished_kernel_uid);
    }
...
}

void accel_sim_framework::cleanup(unsigned finished_kernel) {
    ...
    if (k->get_uid() == finished_kernel ||
        m_gpgpu_sim->cycle_insn_cta_max_hit() || !m_gpgpu_sim->active()) {
        ...
    }
    ...
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions