lowbit-kernel

Here is 1 public repository matching this topic...

yifu-ding / BGEMM-CUDA

BGEMM-CUDA is a CUDA-based low-bit GEMM kernel library for efficient neural network inference. It implements optimized binary and ternary matrix multiplication primitives, including binary-weight and ternary-activation computation, with PyTorch extension support for model-level integration.

cuda high-performance-computing cuda-kernels gemm binarization low-bit-quantization bgemm lowbit-kernel

Updated Aug 30, 2024
Cuda

Improve this page

Add a description, image, and links to the lowbit-kernel topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the lowbit-kernel topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lowbit-kernel

Here is 1 public repository matching this topic...

yifu-ding / BGEMM-CUDA

Improve this page

Add this topic to your repo