refactor: post-wire-encoder cleanup and documentation#119
Merged
ChrisLundquist merged 4 commits intomasterfrom Mar 10, 2026
Merged
refactor: post-wire-encoder cleanup and documentation#119ChrisLundquist merged 4 commits intomasterfrom
ChrisLundquist merged 4 commits intomasterfrom
Conversation
- Add `demux_tokens()` taking `&[LzToken]` directly, skip intermediate `Vec<lz77::Match>` in GPU SortLZ coordinator - Migrate LzSeq SortLZ CPU path from `encode_match_sequence` to `parse_matches` → `encode_from_tokens` - Migrate `encode_optimal` to use `matches_to_tokens` → `encode_from_tokens` - Delete `matches_to_lz77_greedy`, `matches_to_lz77_lazy` (~110 lines) - Delete `encode_match_sequence` (~55 lines) - Replace sortlz LZ77 roundtrip tests with token-based equivalents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Route SortLz and Optimal parse strategies through the shared tokenize() entry point for LzSeq pipelines. This gives LzSeqR/LzSeqH access to GPU match finding and unified parse strategy dispatch for these modes. Keep encode_with_config() for the default lazy/greedy CPU path (tuned adaptive chain depth, hash4 prefix, repeat-offset-aware matching). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update multi-stream section to reflect TokenEncoder architecture (LzSeqEncoder 6-stream, LzssEncoder 4-stream). Add "Active architecture" section documenting GPU vs CPU design decisions. Add comprehensive wire format reference covering container V2, per-block framing, entropy coders, and pipeline ID table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TODO-gpu-rans-6stream-bug.md (GPU rANS fails with 6-stream LzSeqR) and TODO-benchmark-lzfi-vs-lzssr.md (consolidation candidate). Move 4 stale/closed plans to completed/ and reorganize index with Investigation TODOs section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to the pluggable wire encoder refactor (#118). Eliminates dead code paths, unifies LzSeq compression strategies, and adds comprehensive documentation.
matches_to_lz77_greedy/lazy→matches_to_tokens→ intermediate conversion with directparse_matches→demux_tokenspath. Delete ~165 lines of dead conversion code (matches_to_lz77_greedy,matches_to_lz77_lazy,encode_match_sequence)tokenize()→encode_from_tokens()entry point. Keep tunedencode_with_config()for default lazy/greedy CPU pathTest plan
./scripts/test.shpasses (695 tests, 0 failures)🤖 Generated with Claude Code