Skip to content

[fp8] Select SGLang FP8 block quant kernel to match inference#1182

Open
yueming-yuan wants to merge 1 commit into
radixark:mainfrom
yueming-yuan:fp8-quant-kernel-selection
Open

[fp8] Select SGLang FP8 block quant kernel to match inference#1182
yueming-yuan wants to merge 1 commit into
radixark:mainfrom
yueming-yuan:fp8-quant-kernel-selection

Conversation

@yueming-yuan
Copy link
Copy Markdown
Collaborator

@yueming-yuan yueming-yuan commented May 22, 2026

No description provided.

@yueming-yuan yueming-yuan force-pushed the fp8-quant-kernel-selection branch from 4aee1fc to b6ccb11 Compare May 22, 2026 21:26
@yueming-yuan yueming-yuan changed the title Select SGLang FP8 block quant kernel [fp8] Select SGLang FP8 block quant kernel to match inference May 22, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates the per_block_cast_to_fp8 function from the sglang library into the FP8 quantization workflow. It introduces a new internal helper, _blockwise_cast_to_fp8, which conditionally utilizes the sglang implementation when the weight block size is (128, 128) and falls back to the existing Triton-based kernel otherwise. Additionally, the sglang utility is safely imported with a fallback to None to ensure compatibility. I have no feedback to provide as there were no review comments to evaluate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant