A comprehensive guide to using CUDA Graphs effectively with PyTorch, covering CUDA fundamentals, PyTorch integration, Megatron-LM implementations, and practical troubleshooting.
View the documentation at: https://docs.nvidia.com/dl-cuda-graph/
- CUDA Graph Basics: Fundamentals, constraints, and quantitative benefits
- PyTorch CUDA Graphs: Integration, Transformer Engine & Megatron-LM, best practices, and handling dynamic patterns
- Examples: Real-world implementations from MLPerf Training (Llama 3.1 405B, GPT-3 175B, Stable Diffusion v2, RNN-T)
- Troubleshooting: Capture failures, numerical errors, memory issues, and performance debugging
# Clone and serve locally (auto-installs dependencies)
git clone https://github.com/NVIDIA/dl-cuda-graph-doc.git
cd dl-cuda-graph-doc
./scripts/sphinx-serve.shVisit http://127.0.0.1:8000 to view the documentation.
Manual build:
uv sync --group docs
uv run --group docs sphinx-build docs docs/_build/htmlWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project uses dual licensing:
- Documentation (Markdown files): CC-BY-4.0 - Creative Commons Attribution 4.0 International
- Code (Python, shell scripts, configuration): MIT
See THIRD-PARTY-NOTICES.md for third-party dependencies and their licenses.