fix(countsketch): restore hh_keys field + apply_delta topk-rebuild#42
Merged
Conversation
PR #36 silently reverted PR #37's hh_keys field on CountSketchDelta when its merge landed. Restore on top of the post-#39 module names (src/sketches/countsketch.rs). Mirrors sketchlib-go's Delta.HHKeys / apply-delta heavy-hitter rebuild path. Unblocks ASAPQuery-backend's PR-#74 consumer to pin asap_sketchlib to main, which in turn unblocks ASAPCollector Phase 3 step 3 (backend consumes asap-precompute-rs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #36 silently reverted PR #37's
hh_keys: Vec<String>field onCountSketchDelta(and the corresponding heavy-hitter rebuild branch inapply_delta) when its merge landed. PRs #38 (KLL seedable RNG) and #39 (module renamecount -> countsketch) were branched off the post-#36 state, so neither restored the missing field. Result:mainhas been missinghh_keysever since.This PR restores the field on top of today's post-#39 module names (
src/sketches/countsketch.rs), mirroring sketchlib-go'sDelta.HHKeysheavy-hitter rebuild path. Unblocks ASAPQuery-backend's PR-#74 consumer to pinasap_sketchlibtomain, which in turn unblocks ASAPCollector Phase 3 step 3 (backend consumesasap-precompute-rs).Changes
CountSketchDelta: addpub hh_keys: Vec<String>(additive; defaults via#[derive(Default)]).CountSketch: addpub topk: Vec<(String, f64)>with#[serde(default)]so legacy msgpack payloads still deserialize. Addtopk_updatehelper that maintains a max-by-count heap bounded byCOUNT_SKETCH_TOPK_CAPACITY = 100(mirrors Go'sTOPK_SIZE).CountSketch::apply_delta: after applying cell deltas, re-estimate every key indelta.hh_keysagainst the post-update matrix and feed it intoself.topkviatopk_update(mirrors Go'sDelta.HHKeysloop insketchlib-go/sketches/CountSketch/delta.go).pub const COUNT_SKETCH_TOPK_CAPACITYre-exported fromsketches::mod.CountSketchDeltais a plain Rust struct (not proto-generated); no proto file regeneration was needed. The wire-format proto forCountSketchStatealready carriesTopKStateseparately.Out-of-range cell handling on
apply_deltais left as fail-fast (existing main behavior); the silent-skip variant onrefactor/wire-format-align-gois out of scope for this unblocker.Test plan
test_apply_delta_rebuilds_topk_from_hh_keys: builds a sketch with two known weighted keys, sends a delta carrying onlyhh_keys, assertstopkis rebuilt with re-estimated counts and that the heavier key out-ranks the lighter one.test_apply_delta_hh_keys_topk_capacity: sendsCOUNT_SKETCH_TOPK_CAPACITY + 5keys viahh_keysand asserts the heap stays bounded at the cap.hh_keys: vec![]in delta literals; 384 lib tests + 1 integration + 18 doctests pass.cargo fmt --all -- --checkclean.cargo clippy --all-targets -- -D warningsclean.