You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace the binned equi-join with the naive column-to-column overlap predicate as the soleGenericTarget INTERSECTS join implementation for both combining (pair-producing) shapes — inner join and outer (LEFT/RIGHT/FULL) join — in the operator-expander registry, and remove the binned path entirely (the IntersectsBinnedJoinTransformer and the intersects_bin_size parameter).
For canonical (0-based half-open) coordinates the predicate is a.chrom = b.chrom AND a.start < b.end AND b.start < a.end. Operator selection per INTERSECTS/CONTAINS/WITHIN is already encoded in expanders/intersects.py (_range_predicate / _column_join), which currently builds this as the residual column-to-column predicate; this issue promotes it from residual fallback to the primary and only path.
Why replace binning with the naive predicate outright:
Outer joins become trivial. The naive predicate is a plain ON condition, so LEFT/RIGHT/FULL JOIN ... ON <predicate> has correct outer semantics with no special handling — removing the reason _has_outer_join_intersects (transformer.py:82) must decline-and-fall-back (binning restructures into bin CTEs/equi-joins and loses the unmatched-row guarantee).
DuckDB gets IEJoin for free. Given the two-inequality predicate, DuckDB's optimizer selects IEJoin itself, potentially making the hand-rolled IntersectsDuckDBIEJoinTransformer redundant (to verify — see Risks).
No row inflation. Removes binning's long-interval row-inflation weakness along with the code.
Today the column-to-column INTERSECTS join rewrites live as pre-pass transformers in src/giql/transformer.py (IntersectsBinnedJoinTransformer, IntersectsDuckDBIEJoinTransformer), selected up front in src/giql/transpile.py (~156, 214-235) from a global capability rather than through the registry. The end state is a registered (GenericTarget, Intersects) default expander that emits the naive predicate for inner and outer joins (wired through ExpansionContext); the IntersectsBinnedJoinTransformer and intersects_bin_size are removed; and a target-specific (target, Intersects) override (e.g. the DuckDB IEJoin expander) supersedes the default via ExpanderRegistry.has_override (transpile.py:181).
This is scoped to the combining join shapes. The filtering SEMI/ANTI shapes (scalar coverage-window sweep + masked-aggregate fusion) are a separate expander and out of scope here.
Motivation
Continue the operator-expander registry migration (CLUSTER/MERGE #144, DISJOIN #143, INTERSECTS predicates #141) so INTERSECTS join strategy is registry-driven and target-overridable. Making the naive overlap predicate the sole generic default keeps the lowest-common-denominator target as simple, correct standard SQL, delegates range-join optimization to the engine, and removes a hand-rolled optimization (binning) whose only purpose was engines without a range-join operator. It gives every engine correct inner and outer INTERSECTS (outer for free), provides one universal fallback more capable than binned, and establishes the default that the DuckDB IEJoin override defers to on the shapes it cannot express.
Expected Outcome
Column-to-column inner and outer INTERSECTS on GenericTarget emit the naive overlap predicate via a registered expander (promoting the existing _column_join from residual to primary).
IntersectsBinnedJoinTransformer and the binned pre-pass path are removed.
The intersects_bin_size parameter is removed from the public transpile() API (breaking change — decide deprecate-with-warning vs hard-remove).
Outer join (LEFT/RIGHT/FULL) INTERSECTS produces correct unmatched-row semantics, verified against a real engine (oracle/integration test).
A target-specific (target, Intersects) override still supersedes the default via has_override.
Risks / verify
Dropping binned removes the fallback for engines without a range-join optimizer, so DataFusion's range/inequality-join behavior (>=52.3.0) becomes load-bearing. If it degrades to nested loops (cf. Range/inequality joins are slow apache/datafusion#8393), the remedy is upstream or a different DataFusion-specific plan — no binned escape hatch remains. Verify before removing binned.
Confirm DuckDB still auto-selects IEJoin for the emitted predicate through GIQL's canonical-coordinate column wrapping. If so, IntersectsDuckDBIEJoinTransformer may be simplifiable or removable (track via the IEJoin-expander migration issue).
Promote the DuckDB IEJoin implementation (IntersectsDuckDBIEJoinTransformer) to a registered (DuckDBTarget, Intersects) override expander, including the outer-join bail-out that defers to this default — separate issue (to be filed; no existing issue covers this migration).
Description
Replace the binned equi-join with the naive column-to-column overlap predicate as the sole
GenericTargetINTERSECTS join implementation for both combining (pair-producing) shapes — inner join and outer (LEFT/RIGHT/FULL) join — in the operator-expander registry, and remove the binned path entirely (theIntersectsBinnedJoinTransformerand theintersects_bin_sizeparameter).For canonical (0-based half-open) coordinates the predicate is
a.chrom = b.chrom AND a.start < b.end AND b.start < a.end. Operator selection per INTERSECTS/CONTAINS/WITHIN is already encoded inexpanders/intersects.py(_range_predicate/_column_join), which currently builds this as the residual column-to-column predicate; this issue promotes it from residual fallback to the primary and only path.Why replace binning with the naive predicate outright:
ONcondition, soLEFT/RIGHT/FULL JOIN ... ON <predicate>has correct outer semantics with no special handling — removing the reason_has_outer_join_intersects(transformer.py:82) must decline-and-fall-back (binning restructures into bin CTEs/equi-joins and loses the unmatched-row guarantee).IntersectsDuckDBIEJoinTransformerredundant (to verify — see Risks).Today the column-to-column INTERSECTS join rewrites live as pre-pass transformers in
src/giql/transformer.py(IntersectsBinnedJoinTransformer,IntersectsDuckDBIEJoinTransformer), selected up front insrc/giql/transpile.py(~156, 214-235) from a global capability rather than through the registry. The end state is a registered(GenericTarget, Intersects)default expander that emits the naive predicate for inner and outer joins (wired throughExpansionContext); theIntersectsBinnedJoinTransformerandintersects_bin_sizeare removed; and a target-specific(target, Intersects)override (e.g. the DuckDB IEJoin expander) supersedes the default viaExpanderRegistry.has_override(transpile.py:181).This is scoped to the combining join shapes. The filtering SEMI/ANTI shapes (scalar coverage-window sweep + masked-aggregate fusion) are a separate expander and out of scope here.
Motivation
Continue the operator-expander registry migration (CLUSTER/MERGE #144, DISJOIN #143, INTERSECTS predicates #141) so INTERSECTS join strategy is registry-driven and target-overridable. Making the naive overlap predicate the sole generic default keeps the lowest-common-denominator target as simple, correct standard SQL, delegates range-join optimization to the engine, and removes a hand-rolled optimization (binning) whose only purpose was engines without a range-join operator. It gives every engine correct inner and outer INTERSECTS (outer for free), provides one universal fallback more capable than binned, and establishes the default that the DuckDB IEJoin override defers to on the shapes it cannot express.
Expected Outcome
GenericTargetemit the naive overlap predicate via a registered expander (promoting the existing_column_joinfrom residual to primary).IntersectsBinnedJoinTransformerand the binned pre-pass path are removed.intersects_bin_sizeparameter is removed from the publictranspile()API (breaking change — decide deprecate-with-warning vs hard-remove).(target, Intersects)override still supersedes the default viahas_override.Risks / verify
IntersectsDuckDBIEJoinTransformermay be simplifiable or removable (track via the IEJoin-expander migration issue).Out of scope (tracked separately)
IntersectsDuckDBIEJoinTransformer) to a registered(DuckDBTarget, Intersects)override expander, including the outer-join bail-out that defers to this default — separate issue (to be filed; no existing issue covers this migration).