cuSparseLt python bindings & performance questions

Hi all,


1. it would be great to have some python bindings for cuSparseLt, because it'll be a while until pytorch supports this for all dtypes, especially low-precision such as int8
2. Using the C++ API, I'm only getting about 30% more speed for sparse int8 x int8 compared to dense `torch._int_mm`. Is that expected? I would have  expected more given the hardware claims of twice the speed.

I'm pretty sure int8xint8 isn't bandwidth-limited on any modern GPU, is it?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuSparseLt python bindings & performance questions #301

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cuSparseLt python bindings & performance questions #301

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions