Complete the DataFusion target and finalize capability-driven canonicalization — Closes #145#166
Merged
Merged
Conversation
Complete the DataFusion target (epic #137) by choosing the coordinate canonicalization emit strategy from the active target's capabilities rather than hardcoding a DuckDB-only star REPLACE. The CanonicalizeCoordinates pass now takes the target's Capabilities, and both the wrapper-CTE projection and the NEAREST row passthrough emit a star REPLACE when supports_star_replace holds (DuckDB) and the portable star EXCEPT form otherwise (the generic / DataFusion family). This removes the two remaining hardcoded REPLACE assumptions — the canonicalizer wrapper and the NEAREST passthrough — so a non-canonical coordinate encoding transpiles to engine-runnable SQL on DataFusion. The capability is threaded from transpile through the pass and through the NEAREST expander; a direct caller that passes no capabilities keeps the REPLACE form, preserving historical behavior. DuckDB output is byte-unchanged. The generic and DataFusion targets now emit the portable EXCEPT form, which is row-equivalent but not column-order-equivalent (EXCEPT re-appends the recomputed interval columns). This is verified end-to-end on the real DataFusion engine across all four coordinate encodings, custom interval-column names, and strand. DataFusion serialization is finalized: it uses the generic sqlglot output, validated by the cross-target oracle, and its capability values are promoted from provisional to verified. The SELECT * over a correlated NEAREST column leak remains deferred to a later query-level seam.
a25bee9 to
07d6cad
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Choose the coordinate-canonicalization emit strategy from the active target's capabilities instead of hardcoding a DuckDB-only star
REPLACE, completing the DataFusion target (epic #137). TheCanonicalizeCoordinatespass now receives the target'sCapabilities, and both the wrapper-CTE projection and the NEAREST row passthrough emit* REPLACEwhensupports_star_replaceholds (DuckDB) and the portable* EXCEPT (start, end), <start>, <end>form otherwise (the generic / DataFusion family). This removes the two remaining hardcodedREPLACEassumptions, so a non-canonical coordinate encoding transpiles to engine-runnable SQL on DataFusion. DuckDB output is byte-unchanged; the generic/DataFusionEXCEPTform is row-equivalent (column order differs, sinceEXCEPTre-appends the recomputed interval columns) and is verified end-to-end on the real DataFusion engine across all four coordinate encodings, custom interval-column names, and strand. DataFusion serialization is finalized on the generic sqlglot output and its capability values promoted from provisional to verified. TheSELECT *-over-correlated-NEAREST column leak (#160) remains deferred to the later query-level seam (#146).Closes #145
Proposed changes
canonicalizer.py—canonicalize_coordinates(expression, capabilities=None)threads capabilities into_canonical_projection, which branches* REPLACEvs portable* EXCEPT;Nonepreserves the historicalREPLACEform for direct callers.transpile.py— passestarget.capabilitiesinto the pass.generators/base.py—_nearest_passthroughgains the same capability gate (resolving itsTODO(#142)).expanders/nearest.py—_distance_and_filtersthreadsctx.capabilitiesto the passthrough from both the LATERAL and decorrelated-fallback forms.targets.py—DataFusionTargetdocstring promoted from "provisional" to verified, recording which capability values the cross-target oracle validates.Test cases
TestCanonicalProjectionCapabilitiescanonicalize_coordinatesis called directly* EXCEPTformTestCanonicalProjectionCapabilitiescanonicalize_coordinatesis called directly* REPLACETestCanonicalProjectionCapabilitiescanonicalize_coordinatesis called* REPLACEcapabilities=Nonehistorical-default contractTestCanonicalProjectionCapabilitiesTestCanonicalProjectionCapabilitiesdialect="datafusion"* EXCEPTwith no leaked operatorTestNearestTargetCanonicalization/TestBaseGIQLGenerator* REPLACE, generic emits* EXCEPT, sharing the distance CASETestDisjoinEncodingSweepOnDataFusionTestNearestEncodingSweepOnDataFusionTestDataFusionSchemaAxes