Migrate DISJOIN to a registered expander and solve the star-REPLACE portability limitation — Closes #143, #153#159
Merged
Conversation
conradbzura
added a commit
that referenced
this pull request
Jun 28, 2026
Add a non-canonical cross-target oracle case so the portable star-EXCEPT passthrough executes on DataFusion, plus an engine-free regression pinning the per-branch cuts-CTE aliases. Document the REPLACE-vs-EXCEPT column-order divergence, centralize the DISJOIN prefix in a constants module, parse with parse_one, type the expander node, and restore the dropped rationale comments. Apply the shared registry-docstring, restore-in-place, and auto-discovery fixes.
Move DISJOIN off the legacy giqldisjoin_sql emitter onto the operator-expander registry (epic #137 wave 3), the last operator migration. The expander assembles the __giql_dj_* WITH-CTE subquery as AST and selects the full-row passthrough by capability: SELECT * REPLACE where supports_star_replace holds (DuckDB), the portable * EXCEPT projection otherwise (DataFusion family). Flip GIQL_EXPAND on GIQLDisjoin and delete giqldisjoin_sql and its DISJOIN-only helpers from the generator. Also fix the duplicate-column bug: alias all four columns in every __giql_dj_cuts UNION branch so the de-canonicalized end column no longer collides with the end-cut under one output name, which DataFusion rejected — promoting the previously pending cross-target DISJOIN case to a real three-target identity test. Squashed rebase onto main (post-#156/#157/#158) incorporating both review rounds: non-canonical * EXCEPT oracle coverage, an engine-free cuts-CTE alias regression, the DJ_PREFIX constant shared with the resolver, parse_one over maybe_parse, typed expander node, refreshed comments, and the shared registry-seam reconciliation.
451a277 to
1aecfa3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrate DISJOIN from the emit-time
BaseGIQLGenerator.giqldisjoin_sqlstring special-case (left by epic #114) to a generic registered AST expander, and make the full-row passthrough capability-driven so non-canonical encodings produce portable SQL on engines that lackSELECT * REPLACE.Register one
expand_disjoinagainstGenericTarget, so every target resolves to it through the registry's generic chain. The expander assembles the same__giql_dj_*WITH-CTE subquery, parses it back into a sqlglot expression, and returns that node for the active target's serializer to render — dissolving the emit-time string special-case. A single capability branch onctx.capabilities.supports_star_replaceselects the passthrough projection form: emitt.* REPLACE (...)on a target that supports it (DuckDB), and the portablet.* EXCEPT (start, end), <recomputed start>, <recomputed end>form otherwise, which every* EXCEPT-capable engine plans. The portable branch is what adds DataFusion support for non-canonical DISJOIN passthrough. Input canonicalization stays owned byCanonicalizeCoordinates(pass 2, #122) — the expander consumes already-canonical 0-based half-open columns and only round-trips the output back into the target's declared encoding.Fix #153 by aliasing all four projected columns (
kc/ks/ke/pos) in every__giql_dj_cutsUNION branch. Previously the bare de-canonicalizedendcolumn and the end-cut expression collided under one output name in the default 0-based half-open identity case; DuckDB tolerated the duplicate, but DataFusion rejected it as a non-unique projection name. With every branch aliased, the projection is internally unique on strict engines and behaviour-preserving on DuckDB. As a result, the cross-target oracle's previously-pinned_pending_153expected-failure is promoted to a real three-target identity test.Part of epic #137 wave 3; carries the shared
ExpanderRegistry.snapshot()/restore()seam that sibling wave-3 PRs also have (dedupe on merge).Closes #143, #153
Proposed changes
Registry save/restore seam (
src/giql/expander.py)Add the public
ExpanderRegistry.snapshot()/ExpanderRegistry.restore()methods, first introduced for these fixtures.snapshot()returns a fresh shallow copy of the(target, operator) → expanderregistrations;restore()drops all current entries and re-installs exactly the snapshot contents. This lets an isolating test fixture (or a plugin) capture the import-time baseline, mutate the process-wideREGISTRYaround a body, and hand the baseline back afterward so the built-in expanders survive a fixture that would otherwiseclear()them permanently.giql.expanderspackage + DISJOIN expander (src/giql/expanders/__init__.py,src/giql/expanders/disjoin.py)Add the
giql.expanderspackage whose__init__auto-imports every submodule viapkgutil.iter_modules, so dropping a<operator>.pyinto the package registers its expander as an import side effect without editing the package file. Adddisjoin.pywith the@register(GenericTarget, GIQLDisjoin)expander and its helpers (_build_disjoin_sql,_disjoin_passthrough,_disjoin_output_encoding,_disjoin_resolution), carrying over the original resolution-unpacking and historical diagnostics verbatim. The passthrough is the capability-driven form described in the summary; the identity 0-based half-open case stays a plaint.*fast path.#153 alias fix (isolated in
disjoin.py's__giql_dj_cutsassembly)Alias
kc/ks/ke/posin all three__giql_dj_cutsUNION branches. This is an isolated, cherry-pickable change: it only adds aliases to existing projections and does not depend on the migration or the capability branch.GIQLDisjoin.GIQL_EXPANDflip + legacy deletion (src/giql/expressions.py,src/giql/generators/base.py,src/giql/transpile.py)Flip
GIQLDisjoin.GIQL_EXPANDfrom the disabled sentinel toTrue, so theExpandOperatorspass replaces the node with the expander's AST. DeleteBaseGIQLGenerator.giqldisjoin_sqland the DISJOIN-only generator helpers (_disjoin_resolution,_disjoin_passthrough,_disjoin_output_encoding) plus the now-unusedGIQLDisjoinimport. Wireimport giql.expandersintranspile.pyso the registry is populated before the first transpile.Test updates
Update
test_disjoin_transpilation.py,test_canonicalizer.py, andtest_expander.pyfor the registry-driven path, and add the capability-passthrough andsnapshot()/restore()coverage. Two execute-on-engine harnesses now transpile with the engine dialect:test_usage_patterns.py(_execute) andcoordinate_space/conftest.py(giql_query) passdialect=engine/dialect="duckdb", because a non-canonical DISJOIN passthrough emits* EXCEPTfor the generic target and* EXCEPTis not DuckDB-runnable — the SQL must be shaped for the engine it executes on. Promote the cross-target oracle'stest_disjoin_on_datafusion_unsupported_pending_153expected-failure totest_disjoin_agrees_across_all_targets, a real three-target identity test.Test cases
TestDisjoinCanonicalization* EXCEPTprojection that drops and re-projects the interval columnsREPLACETestDisjoinCanonicalizationREPLACEon the final projectionREPLACEpassthrough on DuckDBTestDisjoinTranspilationdisjoin_chrom/disjoin_start/disjoin_endcolumnsTestDisjoinTranspilationEXISTSclauseTestDisjoinTranspilationEXISTSclause against the referenceTestDisjoinTranspilation__giql_dj_prefix, or an unknown reference nameTestExpanderRegistrysnapshot()snapshot()independenceTestExpanderRegistryrestore()is called with the snapshotrestore()semanticsTestExpandOperatorsPassGIQL_EXPANDdispatchTestNoOpWhenFlagsOffTestCrossTargetOracleDisjoinTestCrossTargetOracleDisjoin