refactor: post-wire-encoder cleanup and documentation by ChrisLundquist · Pull Request #119 · ChrisLundquist/libpz

ChrisLundquist · 2026-03-10T09:44:34Z

Summary

Follow-up to the pluggable wire encoder refactor (#118). Eliminates dead code paths, unifies LzSeq compression strategies, and adds comprehensive documentation.

Eliminate double-conversion in SortLZ paths: Replace matches_to_lz77_greedy/lazy → matches_to_tokens → intermediate conversion with direct parse_matches → demux_tokens path. Delete ~165 lines of dead conversion code (matches_to_lz77_greedy, matches_to_lz77_lazy, encode_match_sequence)
Unify LzSeq Optimal+SortLz through tokenize(): Route Optimal and SortLz strategies through shared tokenize() → encode_from_tokens() entry point. Keep tuned encode_with_config() for default lazy/greedy CPU path
Update ARCHITECTURE.md: Rewrite multi-stream section for TokenEncoder architecture, add "Active architecture" section documenting GPU vs CPU design decisions
Add wire-formats.md: Comprehensive wire format reference (container V2, per-block framing, entropy coders, pipeline ID table)
Add investigation TODOs: GPU rANS 6-stream bug, Lzfi vs LzssR consolidation candidate
Archive 4 stale exec plans: Move closed/parked plans to completed/

Test plan

./scripts/test.sh passes (695 tests, 0 failures)
Clippy clean, zero warnings
Review wire-formats.md for accuracy against source code
Spot-check ARCHITECTURE.md updates

🤖 Generated with Claude Code

- Add `demux_tokens()` taking `&[LzToken]` directly, skip intermediate `Vec<lz77::Match>` in GPU SortLZ coordinator - Migrate LzSeq SortLZ CPU path from `encode_match_sequence` to `parse_matches` → `encode_from_tokens` - Migrate `encode_optimal` to use `matches_to_tokens` → `encode_from_tokens` - Delete `matches_to_lz77_greedy`, `matches_to_lz77_lazy` (~110 lines) - Delete `encode_match_sequence` (~55 lines) - Replace sortlz LZ77 roundtrip tests with token-based equivalents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Route SortLz and Optimal parse strategies through the shared tokenize() entry point for LzSeq pipelines. This gives LzSeqR/LzSeqH access to GPU match finding and unified parse strategy dispatch for these modes. Keep encode_with_config() for the default lazy/greedy CPU path (tuned adaptive chain depth, hash4 prefix, repeat-offset-aware matching). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update multi-stream section to reflect TokenEncoder architecture (LzSeqEncoder 6-stream, LzssEncoder 4-stream). Add "Active architecture" section documenting GPU vs CPU design decisions. Add comprehensive wire format reference covering container V2, per-block framing, entropy coders, and pipeline ID table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add TODO-gpu-rans-6stream-bug.md (GPU rANS fails with 6-stream LzSeqR) and TODO-benchmark-lzfi-vs-lzssr.md (consolidation candidate). Move 4 stale/closed plans to completed/ and reorganize index with Investigation TODOs section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ChrisLundquist and others added 4 commits March 10, 2026 02:29

ChrisLundquist merged commit 5779fba into master Mar 10, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: post-wire-encoder cleanup and documentation#119

refactor: post-wire-encoder cleanup and documentation#119
ChrisLundquist merged 4 commits intomasterfrom
claude/post-refactor-cleanup

ChrisLundquist commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChrisLundquist commented Mar 10, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant