Skip to content

refactor: post-wire-encoder cleanup and documentation#119

Merged
ChrisLundquist merged 4 commits intomasterfrom
claude/post-refactor-cleanup
Mar 10, 2026
Merged

refactor: post-wire-encoder cleanup and documentation#119
ChrisLundquist merged 4 commits intomasterfrom
claude/post-refactor-cleanup

Conversation

@ChrisLundquist
Copy link
Copy Markdown
Owner

Summary

Follow-up to the pluggable wire encoder refactor (#118). Eliminates dead code paths, unifies LzSeq compression strategies, and adds comprehensive documentation.

  • Eliminate double-conversion in SortLZ paths: Replace matches_to_lz77_greedy/lazymatches_to_tokens → intermediate conversion with direct parse_matchesdemux_tokens path. Delete ~165 lines of dead conversion code (matches_to_lz77_greedy, matches_to_lz77_lazy, encode_match_sequence)
  • Unify LzSeq Optimal+SortLz through tokenize(): Route Optimal and SortLz strategies through shared tokenize()encode_from_tokens() entry point. Keep tuned encode_with_config() for default lazy/greedy CPU path
  • Update ARCHITECTURE.md: Rewrite multi-stream section for TokenEncoder architecture, add "Active architecture" section documenting GPU vs CPU design decisions
  • Add wire-formats.md: Comprehensive wire format reference (container V2, per-block framing, entropy coders, pipeline ID table)
  • Add investigation TODOs: GPU rANS 6-stream bug, Lzfi vs LzssR consolidation candidate
  • Archive 4 stale exec plans: Move closed/parked plans to completed/

Test plan

  • ./scripts/test.sh passes (695 tests, 0 failures)
  • Clippy clean, zero warnings
  • Review wire-formats.md for accuracy against source code
  • Spot-check ARCHITECTURE.md updates

🤖 Generated with Claude Code

ChrisLundquist and others added 4 commits March 10, 2026 02:29
- Add `demux_tokens()` taking `&[LzToken]` directly, skip intermediate
  `Vec<lz77::Match>` in GPU SortLZ coordinator
- Migrate LzSeq SortLZ CPU path from `encode_match_sequence` to
  `parse_matches` → `encode_from_tokens`
- Migrate `encode_optimal` to use `matches_to_tokens` → `encode_from_tokens`
- Delete `matches_to_lz77_greedy`, `matches_to_lz77_lazy` (~110 lines)
- Delete `encode_match_sequence` (~55 lines)
- Replace sortlz LZ77 roundtrip tests with token-based equivalents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Route SortLz and Optimal parse strategies through the shared tokenize()
entry point for LzSeq pipelines. This gives LzSeqR/LzSeqH access to GPU
match finding and unified parse strategy dispatch for these modes.

Keep encode_with_config() for the default lazy/greedy CPU path (tuned
adaptive chain depth, hash4 prefix, repeat-offset-aware matching).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update multi-stream section to reflect TokenEncoder architecture
(LzSeqEncoder 6-stream, LzssEncoder 4-stream). Add "Active architecture"
section documenting GPU vs CPU design decisions. Add comprehensive
wire format reference covering container V2, per-block framing, entropy
coders, and pipeline ID table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TODO-gpu-rans-6stream-bug.md (GPU rANS fails with 6-stream LzSeqR)
and TODO-benchmark-lzfi-vs-lzssr.md (consolidation candidate). Move 4
stale/closed plans to completed/ and reorganize index with Investigation
TODOs section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ChrisLundquist ChrisLundquist merged commit 5779fba into master Mar 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant