Relocate CLUSTER and MERGE into the operator-expander registry — Closes #144#162
Merged
Merged
Conversation
Migrate the last two operators off the pre-pass transformer chain and onto the ExpandOperators registry (epic #137), completing the operator migration so every GIQL operator dispatches through one mechanism. CLUSTER and MERGE are whole-query rewrites, so each expander navigates to its enclosing SELECT and mutates it in place, returning the operator node unchanged (a no-op replace) — the same pattern NEAREST's decorrelated fallback uses, required because the canonical form puts the operator at the root SELECT, which has no parent to replace through. The pass's deepest-first walk subsumes the transformers' manual recursion into CTEs and subqueries. Columns are derived from the FROM table, so the operators carry an empty resolution and deliberately skip canonicalization. Emitted SQL is semantically identical to the legacy output and now deterministic; the only intended byte difference is the injected lag_calc column order, which the legacy output left hash-seed-dependent. Because the in-place rewrite copies the enclosing WHERE/HAVING into the new inner subquery, the expander re-runs the pass over the restructured SELECT (the active registry is threaded through ExpansionContext) so a sibling operator carried into that subquery — a spatial predicate or DISTANCE in a filtered cluster/merge — is expanded rather than leaking to the generator. The standalone ClusterTransformer and MergeTransformer classes are removed and the pre-pass calls are dropped from the transpile pipeline. Some shapes that the legacy pipeline accepted only to emit non-executable or invalid SQL now raise a clear ValueError instead: CLUSTER and MERGE in a single SELECT (in-place mutation cannot express both), multiple CLUSTER expressions in one SELECT (as multiple MERGE already did), and a CLUSTER or MERGE nested inside a larger projection expression. The MERGE GROUP BY chromosome term is now quoted like every other column reference, so a reserved-word custom chrom column emits valid SQL. No working query is affected. Synthesized identifiers that do not yet use the reserved __giql_ prefix are left as-is for a follow-up; renaming them would change emitted SQL.
06db98c to
3950b19
Compare
This was referenced Jun 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrate the last two operators, CLUSTER and MERGE, off the pre-pass transformer chain and onto the
ExpandOperatorsregistry (epic #137), so every GIQL operator now dispatches through one mechanism. CLUSTER and MERGE are whole-query rewrites, so each expander navigates to its enclosingSELECT, restructures a detached copy, and transplants the result back onto the originalSELECTin place — returning the operator node unchanged (a no-op replace). This root-preserving transplant is required because the canonical form puts the operator at the rootSELECT, which has no parent to replace through. The pass's deepest-first walk subsumes the transformers' manual recursion into CTEs and subqueries, and the standaloneClusterTransformer/MergeTransformerclasses and their pre-pass calls are removed.Emitted SQL is semantically identical to the legacy output and now deterministic; the only intended byte difference is the injected
lag_calccolumn order, which the legacy output left hash-seed-dependent. Because the in-place rewrite copies the enclosingWHERE/HAVINGinto the new inner subquery, the expander re-runs the pass over the restructuredSELECT(threading the active registry throughExpansionContext) so a sibling operator carried into that subquery — a spatial predicate orDISTANCEin a filtered cluster/merge — is expanded rather than leaking to the generator.A few shapes the legacy pipeline accepted only to emit non-executable or invalid SQL now raise a clear
ValueErrorinstead: CLUSTER and MERGE in oneSELECT, multiple CLUSTER expressions in oneSELECT(as multiple MERGE already did), and a CLUSTER/MERGE nested inside a larger projection expression. The MERGEGROUP BYchromosome term is now quoted, so a reserved-word custom chrom column emits valid SQL. No working query is affected.Two non-blocking follow-ups are deferred: extracting the shared CLUSTER/MERGE toolkit into a neutral module (#163) and resolving genomic columns over a derived-table FROM (#164).
Closes #144
Proposed changes
giql/expanders/cluster.pyandgiql/expanders/merge.py— generic (GenericTarget) expanders registered forGIQLCluster/GIQLMerge, porting the legacy two-levellag_calcand clustered-aggregation rewrites. MERGE composes CLUSTER'sexpand_cluster_query. Shared helpers (genomic_columns,transplant,find_projected,extract_stranded,reject_cluster_merge_mix,require_top_level_projection,GenomicColumns) form the cross-operator toolkit.giql/expander.py—ExpansionContextnow carries the activeregistry, letting a whole-query rewrite re-enterexpand_operatorsover theSELECTit just restructured.giql/expressions.py—GIQL_EXPAND = TrueonGIQLCluster/GIQLMerge; they deliberately skip canonicalization and carry an empty resolution.giql/resolver.py—GIQLCluster/GIQLMergeadded to the_OPERATORSroster; module scope note reconciled to the post-migration state.giql/transformer.py/giql/transpile.py—ClusterTransformer/MergeTransformerand their pre-pass calls removed (−661 lines intransformer.py).docs/dialect/aggregation-operators.rst— document the CLUSTER+MERGE-in-one-SELECTValueErrorand the separate-queries workaround.Test cases
TestClusterExpanderlag_calcform with no leaked operatorTestClusterExpanderWHERElag_calcis itself expanded — no leakTestClusterExpanderValueErrorTestClusterExpanderSELECTValueErrorTestClusterExpanderPYTHONHASHSEEDvaluesTestMergeExpanderWHERETestMergeExpanderGROUP BYchrom term is quotedTestMergeExpanderSELECT, or multiple/nested MERGEValueErrorTestExpander*