feat: wire GPU LZ77 match-finding into streaming compressor by ChrisLundquist · Pull Request #121 · ChrisLundquist/libpz

ChrisLundquist · 2026-03-11T07:37:31Z

Summary

Wires the GPU LZ77 match-finding coordinator into the streaming compressor (compress_stream_parallel), matching the architecture from parallel.rs
Adds adaptive backpressure (shared AtomicUsize: +2 on channel Full, -1 on Ok) so workers stop trying GPU after a few Full signals — prevents slow GPU hardware from becoming a bottleneck
Fixes critical bug where "CPU fallback" workers still routed through GPU via compress_and_demux → lzseq_encode_gpu(), causing all 196 blocks to contend for GPU (25s → 0.9s)
Gates GPU coordinator to LZ-demux pipelines only (Pipeline::uses_lz_demux()) — BWT/SortLz handle their own GPU paths
Adds compress_block_from_demux for entropy-only encoding of pre-computed GPU match results

Key results (LzSeqR, silesia/mozilla, 4 threads)

Path	Wall time
Streaming GPU (before fix)	25.4s
Streaming GPU (after fix)	0.94s
Streaming CPU	0.98s

Test plan

cargo clippy --all-targets -- -D warnings — clean
cargo test — all tests pass
pz -c -p lzseqr -g -t 4 samples/silesia/mozilla > /dev/null — should be ~1s
Round-trip: pz -c -p lzseqr -g -t 4 mozilla | pz -d > /tmp/rt && diff mozilla /tmp/rt

🤖 Generated with Claude Code

Merge the GPU coordinator into compress_stream_parallel using try_send + CPU fallback with adaptive backpressure, matching the in-memory scheduler's pattern from parallel.rs. Key changes: - GPU coordinator thread batches blocks for find_matches_batched, demuxes matches, and entropy-encodes via compress_block_from_demux - Workers use CPU-only options (backend: Cpu, webgpu_engine: None) to prevent accidental GPU routing through compress_and_demux - Adaptive backpressure (AtomicUsize: +2 on Full, -1 on Ok) limits GPU blocks to an initial burst, then routes everything to CPU - GPU coordinator only spawns for LZ-demux pipelines (Pipeline::uses_lz_demux); BWT/SortLz pass through their own GPU paths in compress_block - Mark two slow optimal-parse tests as #[ignore] (>60s in debug) Before: pz -c -p lzseqr -g -t4 mozilla took 25.9s (workers accidentally routed all blocks through GPU via compress_and_demux). After: 0.94s — on par with CPU-only and in-memory GPU paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ChrisLundquist merged commit d33a953 into master Mar 11, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire GPU LZ77 match-finding into streaming compressor#121

feat: wire GPU LZ77 match-finding into streaming compressor#121
ChrisLundquist merged 1 commit intomasterfrom
claude/determined-mcnulty

ChrisLundquist commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChrisLundquist commented Mar 11, 2026

Summary

Key results (LzSeqR, silesia/mozilla, 4 threads)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant