feat: wire GPU LZ77 match-finding into streaming compressor#121
Merged
ChrisLundquist merged 1 commit intomasterfrom Mar 11, 2026
Merged
feat: wire GPU LZ77 match-finding into streaming compressor#121ChrisLundquist merged 1 commit intomasterfrom
ChrisLundquist merged 1 commit intomasterfrom
Conversation
Merge the GPU coordinator into compress_stream_parallel using try_send + CPU fallback with adaptive backpressure, matching the in-memory scheduler's pattern from parallel.rs. Key changes: - GPU coordinator thread batches blocks for find_matches_batched, demuxes matches, and entropy-encodes via compress_block_from_demux - Workers use CPU-only options (backend: Cpu, webgpu_engine: None) to prevent accidental GPU routing through compress_and_demux - Adaptive backpressure (AtomicUsize: +2 on Full, -1 on Ok) limits GPU blocks to an initial burst, then routes everything to CPU - GPU coordinator only spawns for LZ-demux pipelines (Pipeline::uses_lz_demux); BWT/SortLz pass through their own GPU paths in compress_block - Mark two slow optimal-parse tests as #[ignore] (>60s in debug) Before: pz -c -p lzseqr -g -t4 mozilla took 25.9s (workers accidentally routed all blocks through GPU via compress_and_demux). After: 0.94s — on par with CPU-only and in-memory GPU paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
compress_stream_parallel), matching the architecture fromparallel.rscompress_and_demux→lzseq_encode_gpu(), causing all 196 blocks to contend for GPU (25s → 0.9s)Pipeline::uses_lz_demux()) — BWT/SortLz handle their own GPU pathscompress_block_from_demuxfor entropy-only encoding of pre-computed GPU match resultsKey results (LzSeqR, silesia/mozilla, 4 threads)
Test plan
cargo clippy --all-targets -- -D warnings— cleancargo test— all tests passpz -c -p lzseqr -g -t 4 samples/silesia/mozilla > /dev/null— should be ~1spz -c -p lzseqr -g -t 4 mozilla | pz -d > /tmp/rt && diff mozilla /tmp/rt🤖 Generated with Claude Code