Skip to content

fix: fall back on older CUDA GPUs#13

Open
robinspt wants to merge 1 commit intoopendatalab:mainfrom
robinspt:hf-runtime-compat
Open

fix: fall back on older CUDA GPUs#13
robinspt wants to merge 1 commit intoopendatalab:mainfrom
robinspt:hf-runtime-compat

Conversation

@robinspt
Copy link
Copy Markdown

Summary

  • disable FlashAttention automatically on pre-Ampere CUDA GPUs
  • fall back from bfloat16 to float16 when bf16 is unsupported
  • apply the compatibility fix to both the HF runner and end-to-end script

Why

On older NVIDIA GPUs, the current HF path can fail with:

  • FlashAttention only supports Ampere GPUs or newer
  • bf16 runtime errors on devices without bfloat16 support

This change keeps the existing fast path on supported GPUs while allowing older CUDA GPUs to run through the SDPA + fp16 fallback path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant