"Adaptive Hybrid Quantization Framework for deploying 7B+ LLMs on low-VRAM devices (e.g., GTX 1050). Features surgical block alignment and Numba-accelerated inference.
-
Updated
Jan 14, 2026 - Python
"Adaptive Hybrid Quantization Framework for deploying 7B+ LLMs on low-VRAM devices (e.g., GTX 1050). Features surgical block alignment and Numba-accelerated inference.
One-click Windows installer for Z-Image Turbo AI image generation. Optimized for low-VRAM GPUs (4GB+). Features Gradio web UI, automatic setup, and GGUF model support.
Lightweight 6GB VRAM Gradio web app with auto-installer for running AuraFlow locally — no cloud, no clutter.
A ComfyUI Workflow for low vram users
Contains the notebooks and workflows configured to run inference from Wan 2.2 Animate with ComfyUI on Kaggle T4 GPUs smoothly
Technical Showcase: 22B True-MoE Engine running on 6GB VRAM (GTX 1060). Demonstrates "Surgical" NF4 quantization, dynamic expert swapping, and the custom "Grace Hopper" pipeline.
A privacy-first Generative AI pipeline for prototyping 3D-style game assets on consumer hardware. Optimized for low-VRAM (4GB) GPUs using PyTorch, Diffusers, and Streamlit.
Audit local LLM function calling and agentic reliability. Visual tool-use benchmarking for quantized models on YOUR hardware.
🚀 Run modern 7B LLMs on legacy 4GB GPUs without crashes, breaking the VRAM barrier for developers facing GPU limitations.
Add a description, image, and links to the low-vram topic page so that developers can more easily learn about it.
To associate your repository with the low-vram topic, visit your repo's landing page and select "manage topics."