Skip to content

fix: resolve LzSeqR parallel encode routing bug#120

Merged
ChrisLundquist merged 1 commit intomasterfrom
claude/investigate-todos
Mar 11, 2026
Merged

fix: resolve LzSeqR parallel encode routing bug#120
ChrisLundquist merged 1 commit intomasterfrom
claude/investigate-todos

Conversation

@ChrisLundquist
Copy link
Copy Markdown
Owner

Summary

  • Fix LzSeqR encode routing bug: The parallel scheduler routed LzSeqR entropy (stage 1) to stage_rans_encode_webgpu which uses a chunked-payload wire format incompatible with the standard rANS decoder. The single-block and single-thread paths correctly used stage_rans_encode_with_options (CPU rANS). This caused InvalidInput errors when compressing with LzSeqR + WebGpu backend + threads > 1.
  • Remove dead stage_rans_encode_webgpu: No callers remain after the fix. GPU rANS entropy is known to be slower than CPU (0.54-0.77x), so re-enabling is low priority.
  • Add 6-stream regression test: test_gpu_rans_interleaved_decode_lzseqr_6stream verifies LzSeqR round-trips correctly with GPU backend and multi-threading.
  • Update TODO docs: Mark GPU rANS bug as resolved with root cause analysis. Add Criterion benchmark data to Lzfi vs LzssR comparison (Lzfi dominates: 543 vs 333 MB/s compress).

Test plan

  • 696 tests pass, clippy clean, zero warnings
  • New test test_gpu_rans_interleaved_decode_lzseqr_6stream passes
  • Existing test_gpu_rans_interleaved_decode_round_trip (LzssR) still passes
  • Criterion benchmarks show no regressions

🤖 Generated with Claude Code

The parallel scheduler routed LzSeqR stage 1 (entropy) to
stage_rans_encode_webgpu which uses a chunked-payload wire format
incompatible with the standard rANS decoder. Single-block and
single-thread paths used stage_rans_encode_with_options (CPU rANS)
correctly. Fixed by removing the GPU routing for LzSeqR entropy,
matching the consistent CPU path. Removed the now-dead
stage_rans_encode_webgpu function.

Added test_gpu_rans_interleaved_decode_lzseqr_6stream to catch
regressions. Updated TODO docs with investigation findings and
Criterion benchmark data for Lzfi vs LzssR comparison.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ChrisLundquist ChrisLundquist merged commit 7593373 into master Mar 11, 2026
4 checks passed
ChrisLundquist added a commit that referenced this pull request Mar 12, 2026
Add architecture section documenting the unified token pipeline (PR #118),
active/removed pipelines table, and Silesia corpus benchmark data. Update
project layout to reflect lz_token.rs and removed modules. Update dead ends
with streaming path bottleneck finding and LzSeqR routing bug (PR #120).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ChrisLundquist added a commit that referenced this pull request Mar 12, 2026
* bench: enable parallel, large, and webgpu benchmarks for Lzfi

Lzfi was only benchmarked on the small Canterbury corpus with no
parallel, large-file, or WebGPU variants. Enable all modes to match
the LzSeqR and Lzf benchmark coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update CLAUDE.md with architecture overview and Silesia benchmarks

Add architecture section documenting the unified token pipeline (PR #118),
active/removed pipelines table, and Silesia corpus benchmark data. Update
project layout to reflect lz_token.rs and removed modules. Update dead ends
with streaming path bottleneck finding and LzSeqR routing bug (PR #120).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant