faster matmul by efoley · Pull Request #3 · efoley/eel

efoley · 2025-03-21T22:43:51Z

Writing the matmul using AVX2 intrinsics shaves about 10% off of the time. FMA doesn't speed things up relative to AVX multiply and add, but I use it anyway for now.

(eel) eric@zinnia:~/development/eel$ ./scripts/main.py data/Qwen2.5-0.5B-Instruct/
Prompt time: 1.9780s
Total time: 5.2664s

vs

(eel) eric@zinnia:~/development/eel$ ./scripts/main.py data/Qwen2.5-0.5B-Instruct/
Prompt time: 2.2554s
Total time: 5.9787s

This doesn't speed things up here though--probably since we are memory bound loading A & x.

efoley added 3 commits March 21, 2025 16:43

feat: AVX2 matmul

cc992c5

feat: use FMA instead of multiply & add

02171b4

This doesn't speed things up here though--probably since we are memory bound loading A & x.

fix: load correct shared library based on the platform

d7dba25

efoley changed the title ~~feat: AVX2 matmul~~ faster matmul Mar 22, 2025

efoley merged commit 7e0dd5c into main Mar 22, 2025
4 checks passed

efoley deleted the avx2-matmul branch March 22, 2025 01:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster matmul#3

faster matmul#3
efoley merged 3 commits intomainfrom
avx2-matmul

efoley commented Mar 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

efoley commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

efoley commented Mar 21, 2025 •

edited

Loading