Add challenge 86: Paged KV-Cache Attention (Medium)#225
Open
claude[bot] wants to merge 2 commits intomainfrom
Open
Add challenge 86: Paged KV-Cache Attention (Medium)#225claude[bot] wants to merge 2 commits intomainfrom
claude[bot] wants to merge 2 commits intomainfrom
Conversation
Implements decode-phase attention over a non-contiguous paged KV cache, modeled on the vLLM paged attention architecture. Teaches block-table indirection, online softmax across scattered memory pages, and the memory access patterns central to LLM serving workloads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Redesign SVG: block_table as a proper table with column headers, cache pool as horizontal memory strip with color-coded blocks and sequence labels. Convert example and computation steps from HTML entities to LaTeX math notation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
|
|
||
| <h2>Implementation Requirements</h2> | ||
| <p> | ||
| Implement the function <code>solve(Q, K_cache, V_cache, block_table, context_lens, output, batch_size, num_heads, head_dim, block_size, max_blocks_per_seq)</code> |
Contributor
There was a problem hiding this comment.
don't really have to say this in the implementation requirements. @claude change this to match the format of other challenge's implementation requirements
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
What makes this interesting for GPU programmers
block_tablethat maps logical block indices to physical block IDs in a shared pool — requires careful pointer arithmetic and strided access patternsFiles
challenge.py: reference implementation, 10 functional test cases (edge cases, power-of-2, non-power-of-2, variable-length batch, realistic sizes), performance test at LLaMA-3 scale (batch=8, heads=32, head_dim=128, block_size=16, ctx_len=2,048)challenge.html: full problem description with SVG block-table visualization, worked example, and constraintsTest plan
--action run) passed on NVIDIA TESLA T4pre-commit run --all-filespasses (black, isort, flake8, clang-format)🤖 Generated with Claude Code