Skip to content

refactor(message_pack_format): split portable/native, drop wrapper module and legacy msgpack shims#46

Merged
zzylol merged 16 commits into
mainfrom
refactor/module-restructure
May 12, 2026
Merged

refactor(message_pack_format): split portable/native, drop wrapper module and legacy msgpack shims#46
zzylol merged 16 commits into
mainfrom
refactor/module-restructure

Conversation

@GordonYuanyc
Copy link
Copy Markdown
Collaborator

@GordonYuanyc GordonYuanyc commented May 8, 2026

Summary

Net effect of this branch (the wrapper module was introduced mid-branch and later removed; final state below):

  • No wrapper module. Wire-format-aligned sketch DTOs live directly under message_pack_format::portable::*; the crate root re-exports them (CountMinSketch, CountSketch, KllSketch, HllSketch, DdSketch, HydraKllSketch, CountMinSketchWithHeap, SetAggregator, DeltaResult).
  • message_pack_format is now concrete: per-sketch DTOs, a unified MessagePackCodec trait, and a single Error type. Split into:
    • portable/ — cross-language wire format shared with sketchlib-go (filenames mirror the Go side; protocol-locked, golden-byte tests catch drift).
    • native/ — thin trait shims over the in-process sketches::* serde paths; bytes are internal to Rust.
  • Single codec entry point. Legacy inherent serialize_msgpack / deserialize_msgpack shims are removed — MessagePackCodec::{to,from}_msgpack is the only API.
  • Hashspec bypass dropped from former wrapper types — they reuse the sketches FastPath math directly.
  • sketches::KLL::k() accessor exposed.
  • Docs refreshed under docs/message_pack_format.md (wrapper.md removed; useful bits folded in). Unused imports cleaned, fmt + clippy across moved files.

Breaking changes

Downstream callers using the inherent shims must migrate:

sketch.serialize_msgpack()        // →  sketch.to_msgpack()
Sketch::deserialize_msgpack(b)    // →  Sketch::from_msgpack(b)

plus use asap_sketchlib::message_pack_format::MessagePackCodec; at the call site. Wire format and byte output are unchanged.

The delta_set_aggregator free functions are unchanged.

Test plan

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features --locked -- -D warnings
  • cargo test --all-features --locked (447 lib + 12 integration + 1 cross-language + 19 doctests, all green)

🤖 Generated with Claude Code

GordonYuanyc and others added 2 commits May 8, 2026 19:11
Move PR #36#45 wire-format-aligned variants out of src/sketches/ into a
new src/wrapper/ module so the in-process algorithms and the
sketchlib-go-aligned producer types live in separate trees. Add a
src/message_pack_format.rs placeholder noting that asap_sketchlib (Rust)
and sketchlib-go each maintain their own MessagePack representation,
byte-compatible at the envelope level.

Files moved:
- src/wrapper/{ddsketch,countminsketch,countsketch,hll,kll,countminsketch_topk}.rs
  carry the wire-format types appended after the per-file banner in PR #36+.
- src/wrapper/{hydra_kll,set_aggregator,delta_set_aggregator}.rs are
  whole-file moves (they were entirely wire-format).

CountL2HH stays in src/sketches/countsketch_topk.rs — it's an in-process
top-k algorithm used by sketch_framework, not a wire-format type.

No semantic changes; cargo test --lib --all-features still 447 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `[`CANONICAL_HASH_SEED_TABLE`]` and `[`CANONICAL_HASH_SEED`]`
shortcut links in the module-level `//!` docs failed to resolve under
`cargo doc --all-features`, breaking the Rust Docs CI workflow with
`-D warnings`. Both items are still defined in this module; the doc
text simply uses inline-code formatting now instead of intra-doc
links so docgen succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GordonYuanyc GordonYuanyc requested a review from zzylol May 8, 2026 20:59
@GordonYuanyc GordonYuanyc marked this pull request as draft May 9, 2026 19:58
GordonYuanyc and others added 8 commits May 10, 2026 21:03
… trait

Move all wire-format DTOs (WireFormat, CmsData, CountMinSketchWithHeapSerialized,
HydraKllSketchData, KllSketchData, StringSet, DeltaResult) out of individual
wrappers and into message_pack_format::dto as the single source of truth.
Add a unified Error enum and a MessagePackCodec trait (static-dispatch, zero
cost) that every wire-format-aligned wrapper implements.

The legacy inherent serialize_msgpack / deserialize_msgpack methods are
preserved with their original signatures as thin shims over the trait, so
callers and FFI bindings need no changes. KllSketchData and DeltaResult
remain pub via re-exports from their wrapper modules.

Add tests/msgpack_compat.rs with round-trip and DTO-shape coverage for every
type, plus #[ignore]-d slots reserved for cross-language golden-bytes
fixtures from sketchlib-go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`[`Error`]`, `[`MessagePackCodec`]`, `[`dto`]` shortcut links inside
the module-level `//!` block failed to resolve under
`cargo doc --all-features` even though the items are defined in the
same module — likely a rustdoc quirk around `pub use` shadowing the
shortcut-link namespace. Switch to inline-code formatting so docgen
succeeds; the items are still discoverable via the usual rustdoc
sidebar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n restructure

The module restructure in 746e335 re-added `[`Error`]` / `[`MessagePackCodec`]`
shortcut links in the `//!` block that fail to resolve under
`cargo doc --all-features` (same rustdoc quirk fixed earlier in 64b6989).
Switch back to inline-code formatting so docgen passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The docs predate the wrapper / message_pack_format split and didn't
cover either module. Add dedicated pages for each and link them from
index, project_overview, and library_map. Entry pages stay lean —
details live on the dedicated pages so readers who don't need them
aren't forced through them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reorganize the ser/de module along an audience boundary so reviewers
can tell at a glance whether a change touches the cross-language wire
contract or only Rust-internal persistence.

- portable/: cross-language wire format shared with sketchlib-go.
  The 9 existing per-algorithm files move here unchanged (only
  super:: imports rewritten to crate::message_pack_format::).
  Touching this directory is a protocol change.
- native/: thin MessagePackCodec trait shims over the existing
  serialize_to_bytes / deserialize_from_bytes methods on the
  pure-Rust generic sketches in src/sketches/. Byte format is
  Rust-internal; Go never reads it.

sketches/ and wrapper/ logic is unchanged. Existing public API
methods (CountMin::serialize_to_bytes, CountMinSketch::serialize_msgpack,
etc.) are preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…able split

Drop the outer `///` doc on `pub mod message_pack_format;` in lib.rs — it
was merged with mod.rs's inner `//!` doc and forced intra-doc link
resolution into the crate root, breaking `[`native`]`, `[`portable`]`,
and `[`Error`]`. Also fix a stale `hydra_kll` path under `portable::` and
pick up `cargo fmt` reorderings. Rewrite docs/message_pack_format.md to
cover the new native/ vs portable/ layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wrapper sketch types each carried inherent serialize_msgpack /
deserialize_msgpack methods that just forwarded to MessagePackCodec::
to_msgpack / from_msgpack with extra error-type adaptation. Now that
MessagePackCodec is the canonical entry point, the shims add a second
name for the same operation and fragment the error surface
(RmpEncodeError vs Box<dyn Error> vs MsgPackError) for no benefit.

Removes the inherent shims from CountMinSketch, CountMinSketchWithHeap,
CountSketch, DdSketch, HllSketch, KllSketch, HydraKllSketch, and
SetAggregator. Internal callers (aggregate_* helpers, tests) now use
to_msgpack / from_msgpack directly. The delta_set_aggregator free
functions are unchanged — they are not type-method shims and keep their
pre-trait signature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GordonYuanyc GordonYuanyc changed the title refactor: split top-level modules into wrapper + message_pack_format refactor: introduce wrapper + message_pack_format modules with unified codec May 11, 2026
@GordonYuanyc GordonYuanyc marked this pull request as ready for review May 11, 2026 05:46
GordonYuanyc and others added 5 commits May 11, 2026 18:00
`wrapper::CountSketch::update`/`estimate` and `wrapper::CountMinSketch::
update`/`estimate` were each open-coding a Go-compatible hash → (col,
sign) derivation via `common::hashspec` self-functions. The same packed
bit-slicing math already lives in `MatrixHashType::Packed64` /
`sketches::CountMin<.., FastPath>::insert_many`, and produces
bit-identical envelope bytes against the existing Go golden fixtures.

Changes:
- `wrapper::CountSketch::{update,estimate}` use `MatrixHashType::Packed64
  + DefaultXxHasher::hash64_seeded(0,..)` directly. No behavior change.
- `wrapper::CountMinSketch` switches its backend from `RegularPath` to
  `FastPath` and routes update/estimate through `backend.insert_many` /
  `backend.estimate` — the wrapper no longer bypasses the backend.

New parity probe in `tests/sketches_go_parity_probe.rs` locks in that
`sketches::Count<Vector2D<i64>, FastPath, DefaultXxHasher>` and
`sketches::CountMin<Vector2D<f64>, FastPath, DefaultXxHasher>` emit
byte-identical SketchEnvelope encodings to the captured sketchlib-go
goldens (1577-byte CountSketch, 8275-byte CMS Frequency-Only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `wrapper/` module was a parallel set of "wire-format-aligned"
sketch types added when the runtime sketches in `sketches/` were
believed not to produce Go-byte-compatible output. The parity probes
added in this branch (`tests/sketches_go_parity_probe.rs`) confirm
`sketches::Count<Vector2D<i64>, FastPath>`,
`sketches::CountMin<Vector2D<f64>, FastPath>`,
`sketches::HyperLogLogImpl<ErtlMLE, P14>`, and `sketches::DDSketch`
all emit byte-identical envelopes to `sketchlib-go` for the same
goldens, so the parallel layer is redundant.

Changes:
- Move every type formerly under `src/wrapper/*.rs` into the
  corresponding `src/message_pack_format/portable/*.rs` file, keeping
  the wire DTO and runtime ops together. Internal tests come along.
- Drop the `crate::wrapper` module entirely.
- Re-export the runtime + wire DTO types at the crate root from their
  new home in `message_pack_format::portable::*`.
- Add small public accessors to `sketches::DDSketch`
  (`alpha`/`sum`/`store_counts`/`store_offset`) and to
  `sketches::HyperLogLogImpl::registers_as_slice` so the parity probe
  can build wire envelopes directly from `sketches::*` without
  reaching into private fields.
- Extend `tests/sketches_go_parity_probe.rs` with HLL + DDSketch
  probes (CountSketch + CountMin already there); all 4 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The compactor capacity `k` is private inside `sketches::KLL<T>` but
needed by downstream consumers that build the Go-compatible
`KllSketchData { k, sketch_bytes }` wire envelope around a directly-
held `sketches::KLL<f64>`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI was red on fmt and cargo doc:
- `cargo fmt --all -- --check` flagged unrelated formatting drift
  across lib.rs, kll.rs, countminsketch.rs, countsketch.rs, and
  msgpack_compat.rs — applied rustfmt.
- `cargo doc -D warnings` failed on broken intra-doc links to
  `crate::wrapper` (the module was deleted in bfe8f37) and an
  unqualified `MatrixHashType::Packed64` link. Repointed the
  surviving rustdoc to `crate::message_pack_format::portable` /
  `crate::MatrixHashType`.

Also removed the now-dead `Error::into_encode` helper that
clippy/rustc warned about (it was the last caller of the
serialize_msgpack shim deleted in f847b18).

Doc refresh under docs/:
- wrapper.md now describes the wire-format-aligned types in their
  new home under message_pack_format/portable/ instead of the
  removed src/wrapper/ tree.
- index.md, library_map.md, project_overview.md, and
  message_pack_format.md updated to drop src/wrapper/ references
  and reflect that DTO + runtime ops are co-located per-algorithm.
- Test file headers (msgpack_compat.rs, sketches_go_parity_probe.rs)
  no longer reference the deleted wrapper module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wrapper.md page documented a module that no longer exists. The
only content not already in message_pack_format.md was the in-process
(generic sketches) vs wire-aligned (portable) selection guidance and
the per-file type listing — both folded into the `portable/` section
of message_pack_format.md.

Updated index.md, project_overview.md, and library_map.md to drop
their wrapper.md links.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GordonYuanyc GordonYuanyc changed the title refactor: introduce wrapper + message_pack_format modules with unified codec refactor(message_pack_format): split portable/native, drop wrapper module and legacy msgpack shims May 12, 2026
@zzylol zzylol merged commit cd1f10d into main May 12, 2026
2 checks passed
@zzylol zzylol deleted the refactor/module-restructure branch May 12, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants