Skip to content

nvcomp: Unnecessary sync events and host memory transfers #300

@dxqb

Description

@dxqb

Please consider the sample code below.

In torch profiler, you can see that there are

  • sync events, even though everything is executed on the same stream
  • MemCpy (Device -> Pageable) and MemCpy (Device -> Pinned) even though nothing ever leaves GPU
  • as a result, gaps in the GPU stream. GPU becomes idle and waiting

This kills performance if the decompression is interleaved with any other GPU ops.

Image

This is what it should look like, when I remove the nvcomp decode operation:

Image
  • GPU operations are scheduled
  • command buffer is full
  • no gaps
import cupy
import torch
from nvidia import nvcomp
from modules.util.profiling_util import TorchProfiler
from torch import Tensor
from torch.utils.dlpack import to_dlpack
import tqdm

device='cuda'
act = torch.randint(-127, 127, (3072, 3072), device=device, dtype=torch.int8)
weight = torch.randint(-127, 127, (3072, 3072), device=device, dtype=torch.int8)

stream = torch.cuda.current_stream().cuda_stream
codec = nvcomp.Codec(algorithm="Zstd", uncomp_chunk_size=2048, cuda_stream=stream)


def encode(x: Tensor):
    array = nvcomp.as_array(x)
    compressed = codec.encode(array).cuda()
    tensor = torch.utils.dlpack.from_dlpack(compressed.to_dlpack()).clone()
    return tensor, x.shape, x.dtype



def decode(compressed_tensor: Tensor, shape: list[int], dtype: torch.dtype) -> Tensor:
    compressed_array = nvcomp.as_array(compressed_tensor, cuda_stream=stream)
    return codec.decode(compressed_array)

compressed = encode(weight)

with TorchProfiler("decompress.json", enabled=True):
    for _ in tqdm.tqdm(range(10000)):
        decode(*compressed)
        #note: this is just another GPU operation to demonstrate the sync. it doesn't even use the result of decode:
        torch._int_mm(act, weight.T)


this might be related to NVIDIA/nvcomp#105 but I'm not entirely sure because I use a different API

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions