Feature Request: Support for smaller text encoders (Gemma 3 4B / quantized Gemma) for <32GB VRAM GPUs

  ## Summary

LTX-2 currently requires Gemma 3 12B as the text encoder, which needs ~24-27GB VRAM to operate. This makes LTX-2 unusable on consumer GPUs with 16GB VRAM (RTX 5080, RTX 4080, etc.), even with FP8 models and all optimizations applied.

## Hardware Tested
- **GPU:** RTX 5080 (16GB VRAM)
- **RAM:** 32GB
- **Model:** ltx-2-19b-dev-fp8.safetensors
- **Text Encoder:** gemma-3-12b-it-qat-q4_0-unquantized

## What I Tried
1. ✅ Using FP8 quantized LTX-2 model
2. ✅ Bypassing the `LTXVGemmaEnhancePrompt` node
3. ✅ Reducing resolution to 512×512 with 41 frames
4. ✅ Using the QAT version of Gemma 3 12B

## Errors Encountered

### Error 1: Device Mismatch (Partial Offloading)
loaded partially; 13433.80 MB usable, 13096.30 MB loaded, 450.12 MB offloaded
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

### Error 2: Out of Memory
Allocated memory: 27598 MiB | Peak Usage: 29068 MiB
torch.OutOfMemoryError: Allocation on device

Even basic CLIP text encoding (not enhancement) requires the full Gemma 3 12B model, which exceeds 16GB VRAM.

## Feature Request

Please consider adding support for smaller text encoder alternatives:

### Option 1: Gemma 3 4B Support
- Google's Gemma 3 4B requires only ~8GB (BF16) or ~2.6GB (int4)
- Would require fine-tuning/distillation but would open LTX-2 to millions of consumer GPU users

### Option 2: Proper Int4/GGUF Quantization for Gemma 3 12B
- Google's QAT models reduce Gemma 3 12B from 24GB to ~6.6GB (int4)
- The current loader may not be utilizing quantization efficiently
- Reference: [Gemma 3 QAT Models](https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/)

### Option 3: Alternative Text Encoder
- T5-XXL (used in LTX-Video 1.0) works on lower VRAM
- Other efficient encoders like T5Gemma-2-4B-4B

## Impact

The current 32GB+ VRAM requirement limits LTX-2 to:
- Datacenter GPUs (A100, H100)
- RTX 5090 / RTX PRO 6000
- ~X% of potential users

Supporting 16GB GPUs would enable:
- RTX 4080, 4080 Super, 5080
- RTX 3090, 4090 Laptop
- ~50%+ of AI enthusiasts

## References
- Related issue: #3 (GGUF or int4 request on LTX-2 repo)
- Related issue: #300 (Memory Management request)

  Thank you for the amazing work on LTX-2!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support for smaller text encoders (Gemma 3 4B / quantized Gemma) for <32GB VRAM GPUs #303

Summary

Hardware Tested

What I Tried

Errors Encountered

Error 1: Device Mismatch (Partial Offloading)

Error 2: Out of Memory

Feature Request

Option 1: Gemma 3 4B Support

Option 2: Proper Int4/GGUF Quantization for Gemma 3 12B

Option 3: Alternative Text Encoder

Impact

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Request: Support for smaller text encoders (Gemma 3 4B / quantized Gemma) for <32GB VRAM GPUs #303

Description

Summary

Hardware Tested

What I Tried

Errors Encountered

Error 1: Device Mismatch (Partial Offloading)

Error 2: Out of Memory

Feature Request

Option 1: Gemma 3 4B Support

Option 2: Proper Int4/GGUF Quantization for Gemma 3 12B

Option 3: Alternative Text Encoder

Impact

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions