Add torch.compile support for 2x faster inference#64
Add torch.compile support for 2x faster inference#64devdaniel wants to merge 3 commits intoHeartMuLa:mainfrom
Conversation
|
May also need these deps package updates And this added |
I tried triton and it will not run. I am in a windows 11 system. I tried with 2.10 and 3.0 found at https://huggingface.co/madbuda/triton-windows-builds. I do have a 5070ti which sometimes causes problems with pytorch and other installations. |
For Windows, make sure you are using a compatible version with your PyTorch. pip uninstall triton triton-windows -y
pip install "triton-windows>=3.2,<3.3"Version compatibility from triton-windows:
|
|
I've updated the warning and README with the recommended triton-windows version for Windows users |
|
Tried this out with AMD and WSL. It does reduce my memory usage significantly, and appears to speed it up to 11it/s on a 7900 XTX. |
Adds
--compileand--compile_modeflags to use torch.compile.This is a massive performance improvement (2x) to inference speed (16 it/s to 32 it/s) tested on RTX 4090, RTX PRO 6000 and A100 taking it from 1:1 real-time inference speed to 2x real-time.
This should auto-detect triton/inductor availability and fall back with a warning.
Windows users will need to install triton-windows separately to use this.