refactor(message_pack_format): split portable/native, drop wrapper module and legacy msgpack shims#46
Merged
Merged
Conversation
Move PR #36–#45 wire-format-aligned variants out of src/sketches/ into a new src/wrapper/ module so the in-process algorithms and the sketchlib-go-aligned producer types live in separate trees. Add a src/message_pack_format.rs placeholder noting that asap_sketchlib (Rust) and sketchlib-go each maintain their own MessagePack representation, byte-compatible at the envelope level. Files moved: - src/wrapper/{ddsketch,countminsketch,countsketch,hll,kll,countminsketch_topk}.rs carry the wire-format types appended after the per-file banner in PR #36+. - src/wrapper/{hydra_kll,set_aggregator,delta_set_aggregator}.rs are whole-file moves (they were entirely wire-format). CountL2HH stays in src/sketches/countsketch_topk.rs — it's an in-process top-k algorithm used by sketch_framework, not a wire-format type. No semantic changes; cargo test --lib --all-features still 447 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
The `[`CANONICAL_HASH_SEED_TABLE`]` and `[`CANONICAL_HASH_SEED`]` shortcut links in the module-level `//!` docs failed to resolve under `cargo doc --all-features`, breaking the Rust Docs CI workflow with `-D warnings`. Both items are still defined in this module; the doc text simply uses inline-code formatting now instead of intra-doc links so docgen succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… trait Move all wire-format DTOs (WireFormat, CmsData, CountMinSketchWithHeapSerialized, HydraKllSketchData, KllSketchData, StringSet, DeltaResult) out of individual wrappers and into message_pack_format::dto as the single source of truth. Add a unified Error enum and a MessagePackCodec trait (static-dispatch, zero cost) that every wire-format-aligned wrapper implements. The legacy inherent serialize_msgpack / deserialize_msgpack methods are preserved with their original signatures as thin shims over the trait, so callers and FFI bindings need no changes. KllSketchData and DeltaResult remain pub via re-exports from their wrapper modules. Add tests/msgpack_compat.rs with round-trip and DTO-shape coverage for every type, plus #[ignore]-d slots reserved for cross-language golden-bytes fixtures from sketchlib-go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`[`Error`]`, `[`MessagePackCodec`]`, `[`dto`]` shortcut links inside the module-level `//!` block failed to resolve under `cargo doc --all-features` even though the items are defined in the same module — likely a rustdoc quirk around `pub use` shadowing the shortcut-link namespace. Switch to inline-code formatting so docgen succeeds; the items are still discoverable via the usual rustdoc sidebar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n restructure The module restructure in 746e335 re-added `[`Error`]` / `[`MessagePackCodec`]` shortcut links in the `//!` block that fail to resolve under `cargo doc --all-features` (same rustdoc quirk fixed earlier in 64b6989). Switch back to inline-code formatting so docgen passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The docs predate the wrapper / message_pack_format split and didn't cover either module. Add dedicated pages for each and link them from index, project_overview, and library_map. Entry pages stay lean — details live on the dedicated pages so readers who don't need them aren't forced through them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reorganize the ser/de module along an audience boundary so reviewers can tell at a glance whether a change touches the cross-language wire contract or only Rust-internal persistence. - portable/: cross-language wire format shared with sketchlib-go. The 9 existing per-algorithm files move here unchanged (only super:: imports rewritten to crate::message_pack_format::). Touching this directory is a protocol change. - native/: thin MessagePackCodec trait shims over the existing serialize_to_bytes / deserialize_from_bytes methods on the pure-Rust generic sketches in src/sketches/. Byte format is Rust-internal; Go never reads it. sketches/ and wrapper/ logic is unchanged. Existing public API methods (CountMin::serialize_to_bytes, CountMinSketch::serialize_msgpack, etc.) are preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…able split Drop the outer `///` doc on `pub mod message_pack_format;` in lib.rs — it was merged with mod.rs's inner `//!` doc and forced intra-doc link resolution into the crate root, breaking `[`native`]`, `[`portable`]`, and `[`Error`]`. Also fix a stale `hydra_kll` path under `portable::` and pick up `cargo fmt` reorderings. Rewrite docs/message_pack_format.md to cover the new native/ vs portable/ layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wrapper sketch types each carried inherent serialize_msgpack / deserialize_msgpack methods that just forwarded to MessagePackCodec:: to_msgpack / from_msgpack with extra error-type adaptation. Now that MessagePackCodec is the canonical entry point, the shims add a second name for the same operation and fragment the error surface (RmpEncodeError vs Box<dyn Error> vs MsgPackError) for no benefit. Removes the inherent shims from CountMinSketch, CountMinSketchWithHeap, CountSketch, DdSketch, HllSketch, KllSketch, HydraKllSketch, and SetAggregator. Internal callers (aggregate_* helpers, tests) now use to_msgpack / from_msgpack directly. The delta_set_aggregator free functions are unchanged — they are not type-method shims and keep their pre-trait signature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`wrapper::CountSketch::update`/`estimate` and `wrapper::CountMinSketch::
update`/`estimate` were each open-coding a Go-compatible hash → (col,
sign) derivation via `common::hashspec` self-functions. The same packed
bit-slicing math already lives in `MatrixHashType::Packed64` /
`sketches::CountMin<.., FastPath>::insert_many`, and produces
bit-identical envelope bytes against the existing Go golden fixtures.
Changes:
- `wrapper::CountSketch::{update,estimate}` use `MatrixHashType::Packed64
+ DefaultXxHasher::hash64_seeded(0,..)` directly. No behavior change.
- `wrapper::CountMinSketch` switches its backend from `RegularPath` to
`FastPath` and routes update/estimate through `backend.insert_many` /
`backend.estimate` — the wrapper no longer bypasses the backend.
New parity probe in `tests/sketches_go_parity_probe.rs` locks in that
`sketches::Count<Vector2D<i64>, FastPath, DefaultXxHasher>` and
`sketches::CountMin<Vector2D<f64>, FastPath, DefaultXxHasher>` emit
byte-identical SketchEnvelope encodings to the captured sketchlib-go
goldens (1577-byte CountSketch, 8275-byte CMS Frequency-Only).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `wrapper/` module was a parallel set of "wire-format-aligned" sketch types added when the runtime sketches in `sketches/` were believed not to produce Go-byte-compatible output. The parity probes added in this branch (`tests/sketches_go_parity_probe.rs`) confirm `sketches::Count<Vector2D<i64>, FastPath>`, `sketches::CountMin<Vector2D<f64>, FastPath>`, `sketches::HyperLogLogImpl<ErtlMLE, P14>`, and `sketches::DDSketch` all emit byte-identical envelopes to `sketchlib-go` for the same goldens, so the parallel layer is redundant. Changes: - Move every type formerly under `src/wrapper/*.rs` into the corresponding `src/message_pack_format/portable/*.rs` file, keeping the wire DTO and runtime ops together. Internal tests come along. - Drop the `crate::wrapper` module entirely. - Re-export the runtime + wire DTO types at the crate root from their new home in `message_pack_format::portable::*`. - Add small public accessors to `sketches::DDSketch` (`alpha`/`sum`/`store_counts`/`store_offset`) and to `sketches::HyperLogLogImpl::registers_as_slice` so the parity probe can build wire envelopes directly from `sketches::*` without reaching into private fields. - Extend `tests/sketches_go_parity_probe.rs` with HLL + DDSketch probes (CountSketch + CountMin already there); all 4 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The compactor capacity `k` is private inside `sketches::KLL<T>` but
needed by downstream consumers that build the Go-compatible
`KllSketchData { k, sketch_bytes }` wire envelope around a directly-
held `sketches::KLL<f64>`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI was red on fmt and cargo doc: - `cargo fmt --all -- --check` flagged unrelated formatting drift across lib.rs, kll.rs, countminsketch.rs, countsketch.rs, and msgpack_compat.rs — applied rustfmt. - `cargo doc -D warnings` failed on broken intra-doc links to `crate::wrapper` (the module was deleted in bfe8f37) and an unqualified `MatrixHashType::Packed64` link. Repointed the surviving rustdoc to `crate::message_pack_format::portable` / `crate::MatrixHashType`. Also removed the now-dead `Error::into_encode` helper that clippy/rustc warned about (it was the last caller of the serialize_msgpack shim deleted in f847b18). Doc refresh under docs/: - wrapper.md now describes the wire-format-aligned types in their new home under message_pack_format/portable/ instead of the removed src/wrapper/ tree. - index.md, library_map.md, project_overview.md, and message_pack_format.md updated to drop src/wrapper/ references and reflect that DTO + runtime ops are co-located per-algorithm. - Test file headers (msgpack_compat.rs, sketches_go_parity_probe.rs) no longer reference the deleted wrapper module. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wrapper.md page documented a module that no longer exists. The only content not already in message_pack_format.md was the in-process (generic sketches) vs wire-aligned (portable) selection guidance and the per-file type listing — both folded into the `portable/` section of message_pack_format.md. Updated index.md, project_overview.md, and library_map.md to drop their wrapper.md links. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zzylol
approved these changes
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Net effect of this branch (the wrapper module was introduced mid-branch and later removed; final state below):
wrappermodule. Wire-format-aligned sketch DTOs live directly undermessage_pack_format::portable::*; the crate root re-exports them (CountMinSketch,CountSketch,KllSketch,HllSketch,DdSketch,HydraKllSketch,CountMinSketchWithHeap,SetAggregator,DeltaResult).message_pack_formatis now concrete: per-sketch DTOs, a unifiedMessagePackCodectrait, and a singleErrortype. Split into:portable/— cross-language wire format shared withsketchlib-go(filenames mirror the Go side; protocol-locked, golden-byte tests catch drift).native/— thin trait shims over the in-processsketches::*serde paths; bytes are internal to Rust.serialize_msgpack/deserialize_msgpackshims are removed —MessagePackCodec::{to,from}_msgpackis the only API.sketchesFastPath math directly.sketches::KLL::k()accessor exposed.docs/message_pack_format.md(wrapper.md removed; useful bits folded in). Unused imports cleaned, fmt + clippy across moved files.Breaking changes
Downstream callers using the inherent shims must migrate:
plus
use asap_sketchlib::message_pack_format::MessagePackCodec;at the call site. Wire format and byte output are unchanged.The
delta_set_aggregatorfree functions are unchanged.Test plan
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-features --locked -- -D warningscargo test --all-features --locked(447 lib + 12 integration + 1 cross-language + 19 doctests, all green)🤖 Generated with Claude Code