fix: fall back on older CUDA GPUs#13

Open

robinspt wants to merge 1 commit intoopendatalab:mainfrom

robinspt:hf-runtime-compat

robinspt commented Apr 12, 2026

Summary

disable FlashAttention automatically on pre-Ampere CUDA GPUs
fall back from bfloat16 to float16 when bf16 is unsupported
apply the compatibility fix to both the HF runner and end-to-end script

Why

On older NVIDIA GPUs, the current HF path can fail with:

FlashAttention only supports Ampere GPUs or newer
bf16 runtime errors on devices without bfloat16 support

This change keeps the existing fast path on supported GPUs while allowing older CUDA GPUs to run through the SDPA + fp16 fallback path.


          fix: fall back on older CUDA GPUs

07cc266

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet