Skip to content

Hide NEAREST decorrelated-fallback internal columns from SELECT * on DataFusion — Closes #160#171

Draft
conradbzura wants to merge 1 commit into
mainfrom
160-hide-nearest-fallback-columns
Draft

Hide NEAREST decorrelated-fallback internal columns from SELECT * on DataFusion — Closes #160#171
conradbzura wants to merge 1 commit into
mainfrom
160-hide-nearest-fallback-columns

Conversation

@conradbzura

Copy link
Copy Markdown
Collaborator

Summary

A correlated NEAREST on DataFusion has no correlated-LATERAL plan, so it expands to a decorrelated ROW_NUMBER() fallback whose rewritten join b must expose reserved rank/key helper columns (__giql_x_rk_*, __giql_x_rn) for its ON clause. A SELECT * / SELECT b.* over it leaked those internal columns into user output, diverging from the DuckDB LATERAL form's schema.

A node-local OperatorExpander cannot rewrite the enclosing query, so this adds a query-level statement-finalizer seam to the expander framework and uses it to project the reserved columns away. The seam is a supported public extension point; the NEAREST fix is its first consumer. Serialization is unchanged for every other query and target, and the cross-target result-identity claim now holds for star projections too (the interim xfail is promoted to a real identity test).

Closes #160

Proposed changes

Add a query-level statement-finalizer seam

Introduce StatementFinalizer (a Callable[[Expression], Expression]) and ExpansionContext.add_statement_finalizer. expand_operators threads one shared finalizer list through the run and applies each registered finalizer to the statement root, in registration order, after all node-local replacements — so a finalizer may return a new root. Each expand_operators call owns its own list, so a re-entrant CLUSTER/MERGE call finalizes its own subtree. Export StatementFinalizer from giql for parity with OperatorExpander, and document the seam in the extension guide.

Hide the NEAREST fallback's reserved columns

Register a finalizer from the DataFusion fallback that wraps the join's enclosing SELECT (resolved lazily via join.parent_select) in SELECT * EXCEPT (...) — but only when the projection surfaces the reserved columns (an unqualified * or b.*). Leave explicit projections and a.* untouched, since wrapping absent columns would fail at engine runtime. Target the enclosing select rather than the statement root so a SELECT * over an explicit-only inner query stays correct.

Reconcile docs

Close the "documented gap" notes in the DataFusionTarget docstring and distance-operators.rst, and document add_statement_finalizer as the query-level boundary in extending.rst.

Test cases

# Test Suite Given When Then Coverage Target
1 TestStatementFinalizer An expander that registers a finalizer expand_operators runs The finalizer receives the post-replacement root Finalizer application timing
2 TestStatementFinalizer An expander registering two finalizers expand_operators runs They apply in registration order Ordered application
3 TestStatementFinalizer A finalizer returning a new root or the same root expand_operators runs The pass returns the new root or same object respectively Return-root threading
4 TestStatementFinalizer Sequential and nested expand_operators calls Both run Each applies only its own finalizers Per-call scoping under re-entry
5 TestNearestFallbackReservedColumnProjection A correlated SELECT b.* or SELECT * NEAREST on DataFusion Transpiling Output is SELECT * EXCEPT (...) whose EXCEPT set equals the reserved columns exactly Wrap on surfacing star
6 TestNearestFallbackReservedColumnProjection A stranded correlated SELECT b.* NEAREST on DataFusion Transpiling The EXCEPT set also carries __giql_x_rk_strand Stranded reserved key
7 TestNearestFallbackReservedColumnProjection Explicit, a.*-only, and nested-star-over-explicit projections Transpiling No wrapper is added No false-positive wrap
8 TestNearestFallbackReservedColumnProjection A SELECT * over an inner SELECT b.*, and the generic LATERAL target Transpiling The wrapper lands on the inner select or is absent respectively Enclosing-select placement and fallback-only gating
9 TestPublicApiSurface The giql package StatementFinalizer is imported from the root and giql.expander It resolves, is in both __all__s, and is the same object Public export surface
10 TestCrossTargetOracleNearest A correlated b.*, *, stranded b.*, or k := 3 b.* NEAREST Run on generic, DuckDB, and DataFusion engines Every target returns identical rows with no reserved columns leaking Cross-target identity on real engines

@conradbzura conradbzura self-assigned this Jul 2, 2026
A correlated NEAREST on DataFusion uses a decorrelated ROW_NUMBER
fallback whose join must expose reserved rank/key columns for its ON
clause; a SELECT * or SELECT b.* over it leaked those internal columns
into user output, diverging from the DuckDB LATERAL form's schema.

Node-local expanders cannot rewrite the enclosing statement, so this
adds a query-level seam: ExpansionContext.add_statement_finalizer
registers a StatementFinalizer that expand_operators applies to the
statement root after all node-local replacements (it may return a new
root). The NEAREST DataFusion fallback registers one that wraps the
enclosing SELECT in SELECT * EXCEPT (...) when, and only when, a
surfacing star projection would expose the reserved columns, so
explicit projections are left untouched. StatementFinalizer is exported
from giql for parity with OperatorExpander.

Serialization is unchanged for every other query and target; the
cross-target identity claim now holds for star projections too.

Claude-Session: https://claude.ai/code/session_01ALxmQysPad4W68wuWuft6W
@conradbzura conradbzura force-pushed the 160-hide-nearest-fallback-columns branch from b2d70b5 to a87ef98 Compare July 3, 2026 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hide NEAREST decorrelated-fallback internal columns from SELECT * on DataFusion

1 participant