Migrate INTERSECTS, CONTAINS, WITHIN, and set predicates to registered expanders — Closes #141#157
Merged
Conversation
conradbzura
added a commit
that referenced
this pull request
Jun 28, 2026
Remove unreachable dispatch branches in the predicate expanders, add ExpanderRegistry.has_override and route the join-deferral gate through it, guard intersects_bin_size under a target override, and preserve tracebacks on the parse-error wrap. Add direct expander tests, binned-target deferral coverage, and error-message characterization. Make the registry docstrings mechanistic and node-local, restore the registry in place, harden auto-discovery, and key the opt-out control on a dynamically derived migrated operator.
… expanders Move INTERSECTS, CONTAINS, WITHIN, and the ANY/ALL set predicates off the legacy *_sql emitters onto the operator-expander registry (epic #137 wave 3). Each predicate expands to standard boolean AST built from the pass-1 resolved-column metadata, byte-identical to the deleted emitters. Capability-gate the binned and IEJoin join transformers on range_join_strategy and add a registry deferral (ExpanderRegistry.has_override) so a target-specific INTERSECTS override supersedes the built-in join rewrite, removing the IEJoin early-return's pipeline skip. Flip GIQL_EXPAND on the four predicate classes and delete their emitters and helpers. Squashed rebase onto main (post-#156) incorporating both review rounds: remove unreachable dispatch branches, add has_override with deferral-gate tests, guard intersects_bin_size under an override, relocate the direct expander tests to tests/expanders/test_intersects.py with CONTAINS/WITHIN column-join coverage, add error-message characterization, and reconcile the shared registry seam and docstrings.
47212f6 to
5565263
Compare
conradbzura
added a commit
that referenced
this pull request
Jun 29, 2026
…fallback Move NEAREST off the legacy giqlnearest_sql emitter onto the operator-expander registry (epic #137 wave 3). Lateral-capable targets get the portable correlated LATERAL subquery; on DataFusion (no correlated-LATERAL physical plan) NEAREST expands to a decorrelated ROW_NUMBER() window fallback that returns identical rows, with a deterministic (start, end) tiebreaker and a synthesized subquery alias so an unaliased correlated NEAREST also runs there (and under python -O). Flip GIQL_EXPAND on GIQLNearest, delete giqlnearest_sql, SUPPORTS_LATERAL, and _nearest_resolution, and make the shared distance/nearest helpers staticmethods. Squashed rebase onto main (post-#156/#157) incorporating both review rounds: cross-target oracle coverage (k>1 ties, opposite-strand co-located rows, max_distance survivors), the SELECT * column-leak claim narrowed and xfail-pinned (tracked by #160), annotated helpers, reserved names derived from EXPAND_ALIAS_PREFIX, and the shared registry-seam reconciliation.
conradbzura
added a commit
that referenced
this pull request
Jun 29, 2026
Move DISJOIN off the legacy giqldisjoin_sql emitter onto the operator-expander registry (epic #137 wave 3), the last operator migration. The expander assembles the __giql_dj_* WITH-CTE subquery as AST and selects the full-row passthrough by capability: SELECT * REPLACE where supports_star_replace holds (DuckDB), the portable * EXCEPT projection otherwise (DataFusion family). Flip GIQL_EXPAND on GIQLDisjoin and delete giqldisjoin_sql and its DISJOIN-only helpers from the generator. Also fix the duplicate-column bug: alias all four columns in every __giql_dj_cuts UNION branch so the de-canonicalized end column no longer collides with the end-cut under one output name, which DataFusion rejected — promoting the previously pending cross-target DISJOIN case to a real three-target identity test. Squashed rebase onto main (post-#156/#157/#158) incorporating both review rounds: non-canonical * EXCEPT oracle coverage, an engine-free cuts-CTE alias regression, the DJ_PREFIX constant shared with the resolver, parse_one over maybe_parse, typed expander node, refreshed comments, and the shared registry-seam reconciliation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrate the spatial predicates (INTERSECTS / CONTAINS / WITHIN) and the quantified set predicates (ANY / ALL) off the legacy
*_sqlemitters onBaseGIQLGeneratorand onto the generic operator-expander registry (epic #137, wave 3). Each predicate now expands to standard boolean sqlglot AST in the newgiql.expanderspackage during pass 3 (ExpandOperators), built from the pass-1ResolvedColumnmetadata already canonicalized to 0-based half-open by pass 2, so the emitted predicate SQL is byte-identical to what the deleted emitters produced.Restructure
transpile.pyso the join-strategy rewrites are capability-gated on the target'srange_join_strategyand defer to the registry. The binned equi-join and DuckDB IEJoin transformers run as pre-pass transformers that consume a column-to-column INTERSECTS join before expansion; a literal-range or residual column-to-column INTERSECTS predicate survives to pass 3 and is rendered by the new expander exactly as the legacy emitter rendered it. Add a registry-deferral seam: a target-specific(target, Intersects)registry entry overrides the built-in join strategy entirely, letting the INTERSECTS node flow untouched intoExpandOperators. This removes the olddialect="duckdb"IEJoin early-return that skipped the expansion pipeline.Deferred: folding the whole-query IEJoin string emitter itself into a node-level expander. Per the issue's design note,
IntersectsDuckDBIEJoinTransformer.transform_to_sqlemits a whole-querySET VARIABLE …; SELECT …string and rewrites the top-level statement, which a node-replacingOperatorExpander.expand(node, ctx) -> exp.Expressioncontract cannot express. The join-strategy rewrites therefore stay as capability-gated pre-pass transformers rather than(target, op)registry entries; only the registry-deferral hook is added so a future target-specific override can supersede them.This is epic #137 wave 3; it carries the shared
ExpanderRegistry.snapshot()/restore()seam that sibling wave-3 PRs also introduce (dedupe on merge).Closes #141
Proposed changes
ExpanderRegistrysave/restore seam (src/giql/expander.py)Add
snapshot()andrestore()toExpanderRegistry.snapshot()returns a shallow copy of the current(target, operator) -> expanderregistrations;restore()drops every current entry and re-installs exactly the snapshot contents. This is the public seam an isolating test fixture (or a plugin) uses to mutate the process-wideREGISTRYaround a body and return it to a captured baseline — so the built-in expanders registered at import survive a fixture that would otherwiseclear()them permanently. Shared with sibling wave-3 PRs.giql.expanderspackage and predicate expanders (src/giql/expanders/__init__.py,src/giql/expanders/intersects.py)Add the
giql.expanderspackage. Importing it registers every built-in expander as a side effect:__init__.pyusespkgutil.iter_modulesto import each submodule, which decorates its expanders with@register(...)at import time, so new operator modules are picked up by dropping a file in without editing the package.Add
giql.expanders.intersectswith fourGenericTargetexpanders:expand_intersects,expand_contains,expand_within, andexpand_spatial_set(ANY / ALL). Each turns one predicate node into a parenthesized boolean built fromResolvedColumnfragments parsed through the GIQL dialect, reproducing the deleted emitter helpers as AST:_range_predicate(literal-range form, including the point-query special case for CONTAINS),_column_join(column-to-column residual form), and the dispatch-on-right-operand logic of the old_generate_spatial_op. The literal-range path reproduces the legacy parse-and-wrap-error behavior verbatim (the historical "Could not parse genomic range" message). Only generic expanders are registered, since spatial-predicate emission is portable SQL-92 and does not vary by engine.Capability-gated join transformers and registry-deferral (
src/giql/transpile.py)Import
giql.expandersonce so the registry is populated before the first transpile. Computetarget_overrides_intersects— true only for an exact non-generic(target, Intersects)entry, deliberately excluding the built-in(GenericTarget, Intersects)predicate expander so it does not disable the join rewrite. Gate both the IEJoin path (if uses_iejoin and not target_overrides_intersects) and the binned-join transformer on this flag: when a target-specific override is registered, the join rewrite is skipped and the INTERSECTS node flows intoExpandOperators. Remove thedialect="duckdb"early-return's pipeline-skip warning block — the IEJoin transformer still short-circuits with a whole-query string when it produces output (safe, since an IEJoin-eligible query carries exactly one INTERSECTS and leaves no residual predicate), but the registry is now consulted on the deferral path it used to preclude.GIQL_EXPANDflips and emitter deletion (src/giql/expressions.py,src/giql/generators/base.py)Flip
GIQL_EXPANDfrom the shared inert default toTrueonIntersects,Contains,Within, andSpatialSetPredicateso the four predicates opt into pass 3. Delete theintersects_sql/contains_sql/within_sql/spatialsetpredicate_sqlemitters fromBaseGIQLGeneratorand their_generate_spatial_op/_generate_spatial_set/_generate_range_predicate/_generate_column_join/_predicate_operandhelpers, plus the now-unused imports.Test updates (
tests/test_expander.py,tests/generators/test_base.py)Rework the registry/flag leak guards to compare against a captured baseline (
REGISTRY.snapshot()) rather than asserting emptiness, since the registry now ships built-in expanders at import;clean_registrysaves and restores that baseline through the new seam. Add_SHIPPED_EXPAND_FLAGSand derive_MIGRATED_OPERATORS/_UNMIGRATED_OPERATORSdynamically so the flag-leak guard restores each operator to its shipped default and the opt-out parametrization stays merge-stable across wave-3 branches. Add an_opted_outcontext manager (complement of_opted_in) for control tests that need a migrated operator to behave as unflagged. Replace the old strict-xfailTestIEJoinEarlyReturnSkipsExpansionwithTestIEJoinRegistryDeferral, addsnapshot/restorecoverage, and route the generator-level spatial tests through pass 3 via the updated_generate_through_passeshelper (now runs passes 1-3).Test cases
TestExpanderRegistryFallbackGapssnapshot()is a copy, not a live viewTestExpanderRegistryFallbackGapsrestore()replaces entries with snapshot contentsTestIEJoinRegistryDeferral(DuckDBTarget, Intersects)override registereddialect='duckdb'SET VARIABLEIEJoin SQL is emittedTestIEJoinRegistryDeferraldialect='duckdb'SET VARIABLESQL is emittedTestOperatorOptOutGIQL_EXPANDclass attributeFalseTestOperatorOptOutGIQL_EXPANDclass attributeTrueTestExpandOperatorsPass(GenericTarget, GIQLDisjoin)but the operator's flag held off via_opted_outGIQL_EXPANDgate isolates dispatchTestNoOpWhenInertTestExpandOperatorsWalkTestBaseGIQLGeneratorValueErrormatching "Could not parse genomic range" is raisedTestBaseGIQLGenerator'chr:a-b') in INTERSECTSValueErroris raised