Skip to content

Drop the binned INTERSECTS join path in favor of the naive overlap predicate #167

Description

@conradbzura

Description

Replace the binned equi-join with the naive column-to-column overlap predicate as the sole GenericTarget INTERSECTS join implementation for both combining (pair-producing) shapes — inner join and outer (LEFT/RIGHT/FULL) join — in the operator-expander registry, and remove the binned path entirely (the IntersectsBinnedJoinTransformer and the intersects_bin_size parameter).

For canonical (0-based half-open) coordinates the predicate is a.chrom = b.chrom AND a.start < b.end AND b.start < a.end. Operator selection per INTERSECTS/CONTAINS/WITHIN is already encoded in expanders/intersects.py (_range_predicate / _column_join), which currently builds this as the residual column-to-column predicate; this issue promotes it from residual fallback to the primary and only path.

Why replace binning with the naive predicate outright:

Today the column-to-column INTERSECTS join rewrites live as pre-pass transformers in src/giql/transformer.py (IntersectsBinnedJoinTransformer, IntersectsDuckDBIEJoinTransformer), selected up front in src/giql/transpile.py (~156, 214-235) from a global capability rather than through the registry. The end state is a registered (GenericTarget, Intersects) default expander that emits the naive predicate for inner and outer joins (wired through ExpansionContext); the IntersectsBinnedJoinTransformer and intersects_bin_size are removed; and a target-specific (target, Intersects) override (e.g. the DuckDB IEJoin expander) supersedes the default via ExpanderRegistry.has_override (transpile.py:181).

This is scoped to the combining join shapes. The filtering SEMI/ANTI shapes (scalar coverage-window sweep + masked-aggregate fusion) are a separate expander and out of scope here.

Motivation

Continue the operator-expander registry migration (CLUSTER/MERGE #144, DISJOIN #143, INTERSECTS predicates #141) so INTERSECTS join strategy is registry-driven and target-overridable. Making the naive overlap predicate the sole generic default keeps the lowest-common-denominator target as simple, correct standard SQL, delegates range-join optimization to the engine, and removes a hand-rolled optimization (binning) whose only purpose was engines without a range-join operator. It gives every engine correct inner and outer INTERSECTS (outer for free), provides one universal fallback more capable than binned, and establishes the default that the DuckDB IEJoin override defers to on the shapes it cannot express.

Expected Outcome

  • Column-to-column inner and outer INTERSECTS on GenericTarget emit the naive overlap predicate via a registered expander (promoting the existing _column_join from residual to primary).
  • IntersectsBinnedJoinTransformer and the binned pre-pass path are removed.
  • The intersects_bin_size parameter is removed from the public transpile() API (breaking change — decide deprecate-with-warning vs hard-remove).
  • Outer join (LEFT/RIGHT/FULL) INTERSECTS produces correct unmatched-row semantics, verified against a real engine (oracle/integration test).
  • Shapes previously declined by the IEJoin transformer fall back to the naive predicate (not binned); binned-specific limitations such as IntersectsBinnedJoinTransformer drops non-INTERSECTS JOIN ON predicates when INTERSECTS lives in WHERE #94 are moot.
  • A target-specific (target, Intersects) override still supersedes the default via has_override.

Risks / verify

  • Dropping binned removes the fallback for engines without a range-join optimizer, so DataFusion's range/inequality-join behavior (>=52.3.0) becomes load-bearing. If it degrades to nested loops (cf. Range/inequality joins are slow apache/datafusion#8393), the remedy is upstream or a different DataFusion-specific plan — no binned escape hatch remains. Verify before removing binned.
  • Confirm DuckDB still auto-selects IEJoin for the emitted predicate through GIQL's canonical-coordinate column wrapping. If so, IntersectsDuckDBIEJoinTransformer may be simplifiable or removable (track via the IEJoin-expander migration issue).

Out of scope (tracked separately)

  • SEMI/ANTI (filtering) INTERSECTS coverage-sweep + masked-aggregate fusion expander — separate issue.
  • Promote the DuckDB IEJoin implementation (IntersectsDuckDBIEJoinTransformer) to a registered (DuckDBTarget, Intersects) override expander, including the outer-join bail-out that defers to this default — separate issue (to be filed; no existing issue covers this migration).

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactorCode restructuring without behavior change

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions