Skip to content

[Feature Request]: Support f16 via the half crate #95

@akern40

Description

@akern40

Hello! We are considering support for f16 (and bf16) via the half crate in ndarray (rust-ndarray/ndarray#1551), but we are seeing rather dismal performance on matrix multiplication for the new types: f16 appears to be ~3 orders of magnitude slower than f32. After some debugging, I believe this is a testament to matrixmultiply's performance: the code on my Apple M2 chip is hitting the f16 assembly instructions, so I think most of the performance difference is thanks to matrixmultiply's very fast sgemm.

In light of this, I was wondering what the appetite would be for supporting f16 here in the matrixmultiply crate.

cc: @swfsql, who has been the champion for f16 in ndarray.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions