Add challenge 86: Paged KV-Cache Attention (Medium) by claude[bot] · Pull Request #225 · AlphaGPU/leetgpu-challenges

claude · 2026-03-24T04:16:20Z

Summary

Adds challenge 86: Paged KV-Cache Attention (Medium difficulty)
Models the decode-phase attention kernel used in vLLM and other LLM serving systems, where KV cache is stored in non-contiguous memory pages
Solvers must implement block-table indirection to gather K/V tokens from scattered physical blocks, then compute scaled dot-product attention with online softmax

What makes this interesting for GPU programmers

Non-contiguous memory access: tokens are fetched via a block_table that maps logical block indices to physical block IDs in a shared pool — requires careful pointer arithmetic and strided access patterns
Online softmax: to avoid materializing all scores, the numerically-stable running-max trick must be applied as blocks are processed one at a time
Memory bandwidth bound: decode-phase attention is memory-bandwidth limited, rewarding coalesced access and shared memory reuse

Files

challenge.py: reference implementation, 10 functional test cases (edge cases, power-of-2, non-power-of-2, variable-length batch, realistic sizes), performance test at LLaMA-3 scale (batch=8, heads=32, head_dim=128, block_size=16, ctx_len=2,048)
challenge.html: full problem description with SVG block-table visualization, worked example, and constraints
6 starter files: CUDA, PyTorch, Triton, JAX, CuTe, Mojo

Test plan

Reference implementation verified against manual calculation for example test
Validation run (--action run) passed on NVIDIA TESLA T4
pre-commit run --all-files passes (black, isort, flake8, clang-format)
Challenge number 86 does not conflict with any merged challenge or open PR
All checklist items in CLAUDE.md verified

🤖 Generated with Claude Code

Implements decode-phase attention over a non-contiguous paged KV cache, modeled on the vLLM paged attention architecture. Teaches block-table indirection, online softmax across scattered memory pages, and the memory access patterns central to LLM serving workloads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Redesign SVG: block_table as a proper table with column headers, cache pool as horizontal memory strip with color-coded blocks and sequence labels. Convert example and computation steps from HTML entities to LaTeX math notation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shxjames · 2026-03-27T02:20:42Z

kunal-mansukhani · 2026-03-27T04:14:04Z

challenges/medium/86_paged_attention/challenge.html

+
+<h2>Implementation Requirements</h2>
+<p>
+  Implement the function <code>solve(Q, K_cache, V_cache, block_table, context_lens, output, batch_size, num_heads, head_dim, block_size, max_blocks_per_seq)</code>


don't really have to say this in the implementation requirements. @claude change this to match the format of other challenge's implementation requirements

claude bot requested review from ishaan-arya, kunal-mansukhani and shxjames as code owners March 24, 2026 04:16

kunal-mansukhani reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add challenge 86: Paged KV-Cache Attention (Medium)#225

Add challenge 86: Paged KV-Cache Attention (Medium)#225
claude[bot] wants to merge 2 commits intomainfrom
add-challenge-86-paged-attention

claude bot commented Mar 24, 2026

Uh oh!

shxjames commented Mar 27, 2026

Uh oh!

kunal-mansukhani Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

claude bot commented Mar 24, 2026

Summary

What makes this interesting for GPU programmers

Files

Test plan

Uh oh!

shxjames commented Mar 27, 2026

Uh oh!

kunal-mansukhani Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants