Skip to content

Feature Request: Support for smaller text encoders (Gemma 3 4B / quantized Gemma) for <32GB VRAM GPUs #303

@Jackson3195

Description

Summary

LTX-2 currently requires Gemma 3 12B as the text encoder, which needs ~24-27GB VRAM to operate. This makes LTX-2 unusable on consumer GPUs with 16GB VRAM (RTX 5080, RTX 4080, etc.), even with FP8 models and all optimizations applied.

Hardware Tested

  • GPU: RTX 5080 (16GB VRAM)
  • RAM: 32GB
  • Model: ltx-2-19b-dev-fp8.safetensors
  • Text Encoder: gemma-3-12b-it-qat-q4_0-unquantized

What I Tried

  1. ✅ Using FP8 quantized LTX-2 model
  2. ✅ Bypassing the LTXVGemmaEnhancePrompt node
  3. ✅ Reducing resolution to 512×512 with 41 frames
  4. ✅ Using the QAT version of Gemma 3 12B

Errors Encountered

Error 1: Device Mismatch (Partial Offloading)

loaded partially; 13433.80 MB usable, 13096.30 MB loaded, 450.12 MB offloaded
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Error 2: Out of Memory

Allocated memory: 27598 MiB | Peak Usage: 29068 MiB
torch.OutOfMemoryError: Allocation on device

Even basic CLIP text encoding (not enhancement) requires the full Gemma 3 12B model, which exceeds 16GB VRAM.

Feature Request

Please consider adding support for smaller text encoder alternatives:

Option 1: Gemma 3 4B Support

  • Google's Gemma 3 4B requires only ~8GB (BF16) or ~2.6GB (int4)
  • Would require fine-tuning/distillation but would open LTX-2 to millions of consumer GPU users

Option 2: Proper Int4/GGUF Quantization for Gemma 3 12B

  • Google's QAT models reduce Gemma 3 12B from 24GB to ~6.6GB (int4)
  • The current loader may not be utilizing quantization efficiently
  • Reference: Gemma 3 QAT Models

Option 3: Alternative Text Encoder

  • T5-XXL (used in LTX-Video 1.0) works on lower VRAM
  • Other efficient encoders like T5Gemma-2-4B-4B

Impact

The current 32GB+ VRAM requirement limits LTX-2 to:

  • Datacenter GPUs (A100, H100)
  • RTX 5090 / RTX PRO 6000
  • ~X% of potential users

Supporting 16GB GPUs would enable:

  • RTX 4080, 4080 Super, 5080
  • RTX 3090, 4090 Laptop
  • ~50%+ of AI enthusiasts

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions