CUDA-Custom-Kernels

My CUDA playground to learn by writing some custom kernels, time them, and plot the results on RTX 3060.

Layout

src/vec_add.cu – 1D vector add (one thread per element)
src/matmul.cu – matrix multiply
- matmul_naive (global memory only)
- matmul_tiled (shared memory tiling)
src/main.cu – simple harness with CUDA events; dumps scripts/results.csv
scripts/plot_perf.py – reads CSV and saves scripts/perf.png

Build (nvcc)

Tested on RTX 3060, CUDA 12.x:

nvcc -O3 -arch=sm_86 -o bin/kernels   src/main.cu src/matmul.cu src/vec_add.cu

(Adjust -arch to your GPU if needed.)

Run

./bin/kernels

This warms up, times each kernel, prints the best time, and appends to scripts/results.csv.

Set matrix sizes in src/main.cu.

Plot

Install once:

pip install pandas matplotlib

Then:

python3 scripts/plot_perf.py

Output goes to scripts/perf.png.

Example results (from my 3060)

scripts/results.csv (best time over a few runs):

name,M,N,K,time,GFLOPS
Naive,1024,256,512,0.348192,770.941
Tiled,1024,256,512,0.270336,992.97
Naive,1024,1024,1024,2.72998,786.629
Tiled,1024,1024,1024,2.0695,1037.68
Naive,4096,1024,2048,20.2281,849.307
Tiled,4096,1024,2048,15.4309,1113.34
Naive,4096,4096,4096,163.842,838.851
Tiled,4096,4096,4096,123.207,1115.51

Tiled > Naive because of shared memory and coalesced loads.

Profiling

Run this:

sudo /usr/local/cuda/bin/ncu --set full --section LaunchStats --section Occupancy ./bin/kernels

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bin		bin
scripts		scripts
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA-Custom-Kernels

Layout

Build (nvcc)

Run

Plot

Example results (from my 3060)

Profiling

About

Uh oh!

Releases

Packages

Languages

Nagharjun17/CUDA-Custom-Kernels

Folders and files

Latest commit

History

Repository files navigation

CUDA-Custom-Kernels

Layout

Build (nvcc)

Run

Plot

Example results (from my 3060)

Profiling

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages