🚀 Optimizing Matrix Multiplication with CUDA

📌 Overview

This repository contains the complete code and resources for benchmarking matrix multiplication on both CPU and GPU, using various implementations and optimizations. The project demonstrates the performance benefits of GPU acceleration using CuPy with FP32 and TensorCore support, alongside CPU-based approaches with NumPy, Numba, and custom CUDA kernels.

⚙️ Features

CPU Implementations:
- Naïve Python multiplication
- NumPy-based matrix multiplication
- Numba parallel-accelerated multiplication
GPU Implementations:
- CuPy FP32: Optimized with standard floating-point precision
- CuPy TensorCore: Faster, mixed-precision matrix multiplication
- Custom CUDA kernel for small matrices with shared memory optimization
Performance Benchmarking:
- Execution time, throughput, and speedup measurements
- Comparison against NumPy baseline performance
Visualizations:
- Logarithmic scale performance graphs
- Speedup and time comparisons

🚦 How to Run

Install dependencies:
Ensure you have the required libraries installed. Use the following command:

pip install numpy numba cupy matplotlib

🔧 Customization

Modify matrix sizes and block dimensions in the scripts for different benchmarks.
Tune the CUDA kernel parameters to optimize performance for specific hardware.
Experiment with different matrix sizes to observe scaling behavior.

✅ Contribute

Feel free to contribute by suggesting further optimizations, adding new algorithms, or improving the visualizations. 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
512x512 Benchmarking (less advanced optimizations).py		512x512 Benchmarking (less advanced optimizations).py
CuPy Advanced (TensorCore and FP32).py		CuPy Advanced (TensorCore and FP32).py
Matrix Multiplication.pdf		Matrix Multiplication.pdf
README.md		README.md
Strassens Algorithm.py		Strassens Algorithm.py
Visuals.py		Visuals.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Optimizing Matrix Multiplication with CUDA

📌 Overview

⚙️ Features

🚦 How to Run

🔧 Customization

✅ Contribute

📚 References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Optimizing Matrix Multiplication with CUDA

📌 Overview

⚙️ Features

🚦 How to Run

🔧 Customization

✅ Contribute

📚 References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages