Skip to content

faster matmul#3

Merged
efoley merged 3 commits intomainfrom
avx2-matmul
Mar 22, 2025
Merged

faster matmul#3
efoley merged 3 commits intomainfrom
avx2-matmul

Conversation

@efoley
Copy link
Copy Markdown
Owner

@efoley efoley commented Mar 21, 2025

Writing the matmul using AVX2 intrinsics shaves about 10% off of the time. FMA doesn't speed things up relative to AVX multiply and add, but I use it anyway for now.

(eel) eric@zinnia:~/development/eel$ ./scripts/main.py data/Qwen2.5-0.5B-Instruct/
Prompt time: 1.9780s
Total time: 5.2664s

vs

(eel) eric@zinnia:~/development/eel$ ./scripts/main.py data/Qwen2.5-0.5B-Instruct/
Prompt time: 2.2554s
Total time: 5.9787s

efoley added 3 commits March 21, 2025 16:43
This doesn't speed things up here though--probably since we are memory bound loading A & x.
@efoley efoley changed the title feat: AVX2 matmul faster matmul Mar 22, 2025
@efoley efoley merged commit 7e0dd5c into main Mar 22, 2025
4 checks passed
@efoley efoley deleted the avx2-matmul branch March 22, 2025 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant