Skip to content

fix(query): prune runtime inlists with block bloom#19516

Merged
zhang2014 merged 2 commits intodatabendlabs:mainfrom
SkyFan2002:fix/runtime-filter-wait-left-semi
Mar 8, 2026
Merged

fix(query): prune runtime inlists with block bloom#19516
zhang2014 merged 2 commits intodatabendlabs:mainfrom
SkyFan2002:fix/runtime-filter-wait-left-semi

Conversation

@SkyFan2002
Copy link
Copy Markdown
Member

@SkyFan2002 SkyFan2002 commented Mar 6, 2026

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Enable block bloom-index pruning for small runtime IN-list filters
  • Fix LEFT SEMI new hash join runtime filter packet generation so probe-side scans receive the built filter
  • Add coverage for the bloom-only pruning case in unit tests and sqllogictests

Changes

  • Add inlist_runtime_bloom_prune_threshold to gate the optimization by IN-list cardinality
  • Record inlist_value_count when building runtime IN-list filters and carry it through runtime filter entries
  • Extend FuseBlockPartInfo with bloom index location/size so scan-side runtime pruning can read block bloom indexes
  • Add scan-side runtime pruning that checks the block bloom index when min/max cannot prune but every IN-list value is absent
  • Fix SemiLeftHashJoin to accumulate and merge runtime filter packets, and always register runtime-filter waiters for probe targets
  • Add a logic test where range pruning cannot eliminate blocks but runtime bloom pruning removes them

Implementation

  1. Thread bloom index metadata through fuse block parts and expose the reader operator needed to load bloom indexes at scan time.
  2. Persist IN-list cardinality in runtime filter conversion and use it to guard bloom pruning with inlist_runtime_bloom_prune_threshold.
  3. Add a bloom-index branch to ExprRuntimePruner for runtime IN-list filters when statistics folding keeps the block.
  4. Fix LEFT SEMI new-hash-join runtime filter packet merging and remove the unnecessary cluster guard around runtime-filter readiness registration.
  5. Add regression coverage with a wide-min/max data layout so EXPLAIN ANALYZE shows runtime bloom pruning while range pruning remains ineffective.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Pair with the reviewer to explain why

Type of change

  • Bug fix (non-breaking change which fixes an issue)

This change is Reviewable

@github-actions github-actions Bot added the pr-bugfix this PR patches a bug in codebase label Mar 6, 2026
@zhang2014 zhang2014 merged commit 107e2a1 into databendlabs:main Mar 8, 2026
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix this PR patches a bug in codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants