Skip to content

Add implementation for bloom filters (predicate pushdown)#652

Open
kiloOhm wants to merge 5 commits into
aloneguid:masterfrom
kiloOhm:feat/bloom-filters
Open

Add implementation for bloom filters (predicate pushdown)#652
kiloOhm wants to merge 5 commits into
aloneguid:masterfrom
kiloOhm:feat/bloom-filters

Conversation

@kiloOhm

@kiloOhm kiloOhm commented Sep 22, 2025

Copy link
Copy Markdown
Contributor

As described in the specification

With the changes in this PR you can now add column-specific settings for bloom filters:

new ParquetOptions {
    BloomFilterOptionsByColumn = new Dictionary<string, ParquetOptions.BloomFilterOptions>() {
        { field.Name, new ParquetOptions.BloomFilterOptions { 
            EnableBloomFilters = true,
            BloomFilterFpp = 0.01f,
        } }
    }
},

During writing, the bloom filters are constructed and written to the data column header

While reading, you can then check if a value is definitely not or maybe present in the rowgroup column:

var reader = new DataColumnReader(field, ms, chunk, stats, footer, parquetOptions: new ParquetOptions {
    BloomFilterOptionsByColumn = new Dictionary<string, ParquetOptions.BloomFilterOptions>() {
            { field.Name, new ParquetOptions.BloomFilterOptions { EnableBloomFilters = true } }
        }
});

Assert.True(reader.MightMatchEquals("contained"));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant