Skip to content

[ENHANCEMENT]: Add count_if and retrieve_if APIs to static_multiset #800

@PointKernel

Description

@PointKernel

Is your feature request related to a problem? Please describe.

static_multiset currently has insert_if and contains_if with stencil/predicate support, but the count, count_outer, retrieve, and retrieve_outer APIs lack corresponding _if variants.

In cuDF's hash join, we want to use a bloom filter to pre-filter probe rows before counting/retrieving matches. The bloom filter produces a per-row boolean predicate. With count_if / retrieve_if, we could skip probe rows that the bloom filter rejects, avoiding unnecessary hash table lookups.

Describe the solution you'd like

Proposed API (following the existing insert_if / contains_if pattern):

  // Count matches only for probe keys where pred(*(stencil + i)) is true.                                                                                                                                                                                                                                                                      
  // Keys where the predicate is false contribute 0 to the count (inner)                                                                                                                                                                                                                                                                        
  // or 1 (outer, for left/full join semantics).                                                                                                                                                                                                                                                                                                
  size_type count_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);                                                                                                                                                                                                                                                      
  size_type count_outer_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                                
  // Retrieve matches only for probe keys where pred(*(stencil + i)) is true.                                                                                                                                                                                                                                                                   
  retrieve_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);                                                                                                                                                                                                                                                             
  retrieve_outer_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);      

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions