Skip to content

Latest commit

 

History

History
385 lines (288 loc) · 11.2 KB

File metadata and controls

385 lines (288 loc) · 11.2 KB

📚 API References and Learning Resources

Official API Documentation

CUDA (NVIDIA)

Primary Documentation

Best Practices Guides

GPU Architecture

OpenCL (Khronos Group)

Primary Documentation

Tutorials and Guides

  • Hands On OpenCL

  • OpenCL Programming Guide (Book)

    • By Aaftab Munshi, Benedict Gaster, et al.
    • ISBN: 978-0321749642

DirectCompute / HLSL (Microsoft)

Primary Documentation

DirectX 11 Programming


Books (Highly Recommended)

GPU Programming

  1. "Programming Massively Parallel Processors"

    • Authors: David Kirk, Wen-mei Hwu
    • ISBN: 978-0124159921
    • Best for: Understanding GPU architecture fundamentals
    • Used in this project: Matrix multiplication optimization insights
  2. "CUDA by Example"

    • Authors: Jason Sanders, Edward Kandrot
    • ISBN: 978-0131387683
    • Best for: Learning CUDA from scratch
    • Used in this project: Vector addition patterns
  3. "Professional CUDA C Programming"

    • Author: John Cheng, Max Grossman, Ty McKercher
    • ISBN: 978-1118739327
    • Best for: Advanced optimization techniques
    • Used in this project: Warp shuffle primitives, bank conflict avoidance
  4. "Heterogeneous Computing with OpenCL 2.0"

    • Authors: David Kaeli, Perhaad Mistry, et al.
    • ISBN: 978-0128014141
    • Best for: Cross-platform GPU programming
    • Used in this project: OpenCL backend design

C++ and Software Engineering

  1. "Effective Modern C++"

    • Author: Scott Meyers
    • ISBN: 978-1491903995
    • Best for: Modern C++ patterns (C++11/14/17)
    • Used in this project: Smart pointers, move semantics, RAII
  2. "Design Patterns"

    • Authors: Gang of Four (Gamma, Helm, Johnson, Vlissides)
    • ISBN: 978-0201633610
    • Best for: Software architecture patterns
    • Used in this project: Strategy, Factory, Singleton patterns

Online Courses and Tutorials

CUDA

OpenCL

DirectCompute/DirectX


Academic Papers (Advanced)

Performance Optimization

  1. "Optimizing Parallel Reduction in CUDA" (Mark Harris, 2007)

  2. "Roofline: An Insightful Visual Performance Model" (Williams et al., 2009)

  3. "Matrix Multiplication on GPUs" (Volkov & Demmel, 2008)

GPU Architecture

  1. "NVIDIA GPU Architecture Whitepapers"

Tools and Utilities

Profilers and Debuggers

NVIDIA Nsight (CUDA)

CodeXL (OpenCL/AMD)

PIX (DirectX)

Performance Analysis


Community Resources

Forums and Q&A

GitHub Repositories


Blogs and Articles

NVIDIA Blogs

Performance Optimization

Industry Blogs


Standards and Specifications

OpenCL

HLSL

C++ Standards


Video Resources

YouTube Channels

Recommended Talks

  1. "Intro to CUDA" - NVIDIA

    • Basic CUDA programming concepts
  2. "Optimizing Parallel Reduction in CUDA" - Mark Harris

    • Our reduction kernel is based on this
  3. "GPU Performance Analysis and Optimization" - NVIDIA GTC

    • Profiling and optimization techniques

How We Used These Resources in This Project

During Initial Development

  1. CUDA C++ Programming Guide → Core backend architecture
  2. "CUDA by Example" → Vector addition implementation
  3. "Programming Massively Parallel Processors" → Matrix multiplication tiling

For Optimization

  1. Mark Harris's Reduction Paper → Reduction kernel optimization
  2. NVIDIA Best Practices Guide → Memory coalescing patterns
  3. Roofline Model Paper → Performance analysis framework

For Architecture Design

  1. "Design Patterns" (GoF) → Strategy and Factory patterns
  2. "Effective Modern C++" → RAII and smart pointers
  3. OpenCL Spec → Cross-platform API design

For Documentation

  1. Professional CUDA C Programming → Technical explanations
  2. NVIDIA Documentation Style → Code comments format
  3. GitHub Best Practices → README structure

Recommended Learning Path

Beginner (0-3 months)

  1. Read "CUDA by Example"
  2. Complete Udacity CS344 course
  3. Study our vector_add.cu kernel
  4. Modify and experiment

Intermediate (3-6 months)

  1. Read "Programming Massively Parallel Processors"
  2. Study our matrix_mul.cu optimizations
  3. Profile with Nsight Compute
  4. Implement your own benchmark

Advanced (6-12 months)

  1. Read "Professional CUDA C Programming"
  2. Study advanced papers (Reduction, Roofline)
  3. Optimize for specific GPU architectures
  4. Contribute to this project!

Contributing Your Knowledge

Found a great resource? Add it here!

  1. Fork the repository
  2. Edit this file
  3. Submit a pull request
  4. Help others learn!

This is your roadmap to GPU programming mastery! 🎓🚀

Next: Apply this knowledge by reading our source code and documentation!


Curated by: Soham Dave
Date: January 2026
For: GPU Benchmark Suite v1.0
Purpose: Comprehensive learning resource collection