Fix BloomFilter buffer incompatibility between Spark and Comet #3003

Shekharrajak · 2025-12-28T09:57:25Z

Handle Spark's full serialization format (12-byte header + bits) in merge_filter() to support Spark partial / Comet final execution. The fix automatically detects the format and extracts bits data accordingly.

Fixes #2889

Rationale for this change

Spark's serialize() returns full format: 12-byte header (version + numHashFunctions + numWords) + bits data
Comet's state_as_bytes() returns bits data only
When Spark partial sends full format, Comet's merge_filter() expects bits-only, causing mismatch

Ref https://github.com/apache/spark/blob/master/common/sketch/src/main/java/org/apache/spark/util/sketch/BitArray.java#L99

Ref https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L219

Spark format: BloomFilterImpl.writeTo() (4+4 bytes) + BitArray.writeTo() (4 bytes + bits)

What changes are included in this PR?

Detects Spark format (buffer size = 12 + expected_bits_size)
Extracts bits data by skipping 12-byte header if Spark format
Returns bits as-is if Comet format

How are these changes tested?

Spark SQL test

Handle Spark's full serialization format (12-byte header + bits) in merge_filter() to support Spark partial / Comet final execution. The fix automatically detects the format and extracts bits data accordingly. Fixes apache#2889

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix BloomFilter buffer incompatibility between Spark and Comet #3003

Fix BloomFilter buffer incompatibility between Spark and Comet #3003

Shekharrajak commented Dec 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix BloomFilter buffer incompatibility between Spark and Comet #3003

Are you sure you want to change the base?

Fix BloomFilter buffer incompatibility between Spark and Comet #3003

Conversation

Shekharrajak commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Shekharrajak commented Dec 28, 2025 •

edited

Loading