Skip to content

fix(countsketch): restore hh_keys field + apply_delta topk-rebuild#42

Merged
zzylol merged 1 commit into
mainfrom
fix/restore-countsketch-hh-keys
May 5, 2026
Merged

fix(countsketch): restore hh_keys field + apply_delta topk-rebuild#42
zzylol merged 1 commit into
mainfrom
fix/restore-countsketch-hh-keys

Conversation

@zzylol
Copy link
Copy Markdown
Contributor

@zzylol zzylol commented May 5, 2026

Summary

PR #36 silently reverted PR #37's hh_keys: Vec<String> field on CountSketchDelta (and the corresponding heavy-hitter rebuild branch in apply_delta) when its merge landed. PRs #38 (KLL seedable RNG) and #39 (module rename count -> countsketch) were branched off the post-#36 state, so neither restored the missing field. Result: main has been missing hh_keys ever since.

This PR restores the field on top of today's post-#39 module names (src/sketches/countsketch.rs), mirroring sketchlib-go's Delta.HHKeys heavy-hitter rebuild path. Unblocks ASAPQuery-backend's PR-#74 consumer to pin asap_sketchlib to main, which in turn unblocks ASAPCollector Phase 3 step 3 (backend consumes asap-precompute-rs).

Changes

  • CountSketchDelta: add pub hh_keys: Vec<String> (additive; defaults via #[derive(Default)]).
  • CountSketch: add pub topk: Vec<(String, f64)> with #[serde(default)] so legacy msgpack payloads still deserialize. Add topk_update helper that maintains a max-by-count heap bounded by COUNT_SKETCH_TOPK_CAPACITY = 100 (mirrors Go's TOPK_SIZE).
  • CountSketch::apply_delta: after applying cell deltas, re-estimate every key in delta.hh_keys against the post-update matrix and feed it into self.topk via topk_update (mirrors Go's Delta.HHKeys loop in sketchlib-go/sketches/CountSketch/delta.go).
  • New pub const COUNT_SKETCH_TOPK_CAPACITY re-exported from sketches::mod.

CountSketchDelta is a plain Rust struct (not proto-generated); no proto file regeneration was needed. The wire-format proto for CountSketchState already carries TopKState separately.

Out-of-range cell handling on apply_delta is left as fail-fast (existing main behavior); the silent-skip variant on refactor/wire-format-align-go is out of scope for this unblocker.

Test plan

  • New unit test test_apply_delta_rebuilds_topk_from_hh_keys: builds a sketch with two known weighted keys, sends a delta carrying only hh_keys, asserts topk is rebuilt with re-estimated counts and that the heavier key out-ranks the lighter one.
  • New unit test test_apply_delta_hh_keys_topk_capacity: sends COUNT_SKETCH_TOPK_CAPACITY + 5 keys via hh_keys and asserts the heap stays bounded at the cap.
  • All existing tests updated to include hh_keys: vec![] in delta literals; 384 lib tests + 1 integration + 18 doctests pass.
  • cargo fmt --all -- --check clean.
  • cargo clippy --all-targets -- -D warnings clean.

PR #36 silently reverted PR #37's hh_keys field on CountSketchDelta
when its merge landed. Restore on top of the post-#39 module names
(src/sketches/countsketch.rs). Mirrors sketchlib-go's
Delta.HHKeys / apply-delta heavy-hitter rebuild path.

Unblocks ASAPQuery-backend's PR-#74 consumer to pin asap_sketchlib
to main, which in turn unblocks ASAPCollector Phase 3 step 3
(backend consumes asap-precompute-rs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zzylol zzylol merged commit 0b32c81 into main May 5, 2026
2 checks passed
@zzylol zzylol deleted the fix/restore-countsketch-hh-keys branch May 5, 2026 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant