Popular repositories Loading
-
-
TurboQuant-Vulkan
TurboQuant-Vulkan PublicForked from tsuyu122/TurboQuant-Vulkan
TurboQuant Vulkan: 3-bit KV cache quantization for llama.cpp using Lloyd-Max Gaussian codebooks. 4.57x compression, Vulkan GPU support (AMD/Intel/NVIDIA). Hobby project.
C++
-
-
vllm-omni
vllm-omni PublicForked from vllm-project/vllm-omni
A framework for efficient model inference with omni-modality models
Python
-
1Cat-vLLM
1Cat-vLLM PublicForked from 1CatAI/1Cat-vLLM
vLLM fork for Tesla V100 (SM70) with AWQ 4-bit support, CUDA 12.8 build flow, and validated Qwen3.5 27B/35B deployment on multi-GPU V100.
Python
If the problem persists, check the GitHub status page or contact support.
