Skip to content

feat(tesseract): Support SQL API grouped sub-query joins#11138

Merged
ovr merged 6 commits into
masterfrom
feat/tesseract-support-subquery-joins
Jun 23, 2026
Merged

feat(tesseract): Support SQL API grouped sub-query joins#11138
ovr merged 6 commits into
masterfrom
feat/tesseract-support-subquery-joins

Conversation

@ovr

@ovr ovr commented Jun 22, 2026

Copy link
Copy Markdown
Member

The SQL API encodes grouped join push-downs (e.g.
JOIN (SELECT ... GROUP BY ... ORDER BY ... LIMIT n) t ON ...) as a subqueryJoins entry on the query. Only the legacy planner consumed it (via customSubQueryJoins); buildSqlAndParamsRust dropped it before the native call and Tesseract had no way to receive it. As a result the joined sub-query was omitted entirely: queries returned unfiltered rows (the inner LIMIT/ORDER vanished) and projections referencing the sub-query alias failed with "missing FROM-clause entry".

Thread subqueryJoins end to end:

  • BaseQuery.js: pass subqueryJoins into both native queryParams.
  • New SubqueryJoin bridge (sql/joinType/alias + on member expression).
  • QueryPropertiesCompiler compiles each on into a SqlCall; stored on QueryProperties as LogicalSubqueryJoinItems.
  • SimpleQueryPlanner folds them into the query's LogicalJoin.
  • Physical builder emits them as SingleSource::RawSubquerySql joins; the no-join fast path now also checks subquery_joins so alias-only projections still materialize the join.

The opaque sub-query SQL (incl. its ORDER BY/LIMIT) is emitted verbatim, so top-N semantics are preserved. The SQL API sends the join alias pre-quoted and references it verbatim in the ON condition, so the raw sub-query source is emitted with its alias as-is (no re-quoting). Changes are gated on a non-empty subquery_joins, leaving existing plans untouched.

@ovr ovr requested review from a team as code owners June 22, 2026 16:41
@github-actions github-actions Bot added rust Pull requests that update Rust code javascript Pull requests that update Javascript code labels Jun 22, 2026
@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 2m 38s —— View job


Review complete — 1 high, 1 medium, 4 low

Expand for full review

End-to-end threading of subqueryJoins from BaseQuery.js through the native query params, into a new SubqueryJoin bridge, compiled into LogicalSubqueryJoinItems, folded into LogicalJoin, and emitted as SingleSource::RawSubquerySql reads cleanly. The no-join fast path correctly checks subquery_joins, and the derive-root-from-ON-hints branch in simple_query_planer.rs is a nice touch for the alias-only projection case. Integration tests exercise the two main shapes.

Findings

# Sev File:Line Issue
1 High physical_plan_builder/processors/logical_join.rs:99-103 Unknown join_type (RIGHT/FULL/CROSS/...) silently downgraded to LEFT — match it explicitly and error on the rest.
2 Medium multi_fact_join_groups.rs:1 Unused JoinHintItem import — will fail -D warnings.
3 Low cube_bridge/subquery_join.rs:17-21 alias consumed verbatim and sql emitted verbatim — document the pre-quoted/trusted-producer contract on the trait.
4 Low query_properties.rs:1173-1182 PartialEq compares only (sql, alias, join_type), not on_sql — two joins with the same triple but different ON conditions compare equal; risk for plan-cache reuse.
5 Low planners/simple_query_planer.rs:83-108 Degenerate case subqueryJoins with empty ON dependencies → make_join_logical_plan_with_join_hints(empty, …); worth a guard or test.
6 Low tests/integration/member_expressions.rs:208-304 Tests cover only INNER + substring asserts; add LEFT path, an insta snapshot, and a "RIGHT must error" case once #1 is fixed.

Other observations

  • BaseQuery.js:965, 1015: forwarded in both buildSqlAndParamsRust and findPreAggregationForQueryRust — good, both rust entrypoints see it.
  • physical_plan/from.rs:104 correctly uses query_aliased_prequoted only for RawSubquerySql; template variable quoted_alias matches the existing template definition.
  • references_builder.rs:272-279: returning None for RawSubquerySql is correct since members are referenced as alias-prefixed literals, not resolved against a schema.
  • No new unit tests for query_properties_compiler::compile_subquery_joins (the Struct expression rejection path, the join_type default fallback) — covered indirectly via integration tests but a focused unit test would be cheap.
  • The LogicalSubqueryJoinItem struct uses public fields, while its sibling LogicalJoinItem exposes accessors — minor style inconsistency, not blocking.

No security issues from the cubesql producer side, but the trust boundary on sql/alias should be documented (finding #3).

· [Branch](https://github.com/cube-js/cube/tree/feat/tesseract-support-subquery-joins)

Comment on lines +17 to +21
pub sql: String,
#[serde(rename = "joinType")]
pub join_type: Option<String>,
pub alias: String,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias contract is trust-based. alias is consumed verbatim (no quoting) and the inner sql is emitted verbatim into the outer query. The PR description states the SQL API sends a pre-quoted alias and a controlled inner SELECT, which is fine for the cubesql producer, but the trait itself doesn't enforce or document the invariant. If any other caller of buildSqlAndParamsRust ever populates subqueryJoins (the JS BaseQuery now forwards this.options.subqueryJoins from any caller), an unquoted/attacker-controlled alias would land in the FROM clause untouched.

Consider tightening the doc-comment on this struct to state explicitly that:

  1. alias MUST be a fully-quoted identifier produced by the caller, and
  2. sql is emitted verbatim and MUST come from a trusted, planner-side path.

Optionally validate at compile time that alias starts/ends with the quote character (" or the dialect-specific one). Worth at least a // SAFETY: …-style note here so the next reader doesn't reintroduce quoting at a different layer.

Comment thread rust/cube/cubesqlplanner/cubesqlplanner/src/planner/query_properties.rs Outdated
@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 42s —— View job


Re-review after chore: fixes (51fe01e)

One-line verdict: 1 high, 4 low still open — the unused-import nit is fixed.

Expand for details

Addressed in 51fe01e:

  • Medium — Unused JoinHintItem import in multi_fact_join_groups.rs removed. Thread resolved.
  • Test helper make_subquery_join now returns Result and propagates MockMemberSql::new failures cleanly — small but nice.

Still open from the prior review:

  • Highphysical_plan_builder/processors/logical_join.rs:99-103 still silently downgrades unknown join_type (RIGHT/FULL/CROSS) to LEFT. Recommend matching INNER/LEFT explicitly and erroring on the rest.
  • Low — cube_bridge/subquery_join.rs:17-21 — trust contract on alias/sql (verbatim, pre-quoted) still undocumented on the trait.
  • Low — query_properties.rs:1173-1182PartialEq still excludes on_sql; two entries with the same (sql, alias, join_type) but different ON conditions compare equal.
  • Low — planners/simple_query_planer.rs:83-108 — degenerate subqueryJoins with empty ON dependencies is still unguarded/untested.
  • Low — tests/integration/member_expressions.rs — coverage now passes via ? but is still INNER-only with assert!(... .contains(...)); LEFT path + an insta snapshot + a "RIGHT must error" case would all strengthen this.

No new findings introduced by this commit.

· [Branch](https://github.com/cube-js/cube/tree/feat/tesseract-support-subquery-joins)

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.68%. Comparing base (741138a) to head (29ea4a6).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11138      +/-   ##
==========================================
+ Coverage   79.17%   83.68%   +4.50%     
==========================================
  Files         474      257     -217     
  Lines       96880    79384   -17496     
  Branches     3558        0    -3558     
==========================================
- Hits        76708    66434   -10274     
+ Misses      19654    12950    -6704     
+ Partials      518        0     -518     
Flag Coverage Δ
cube-backend ?
cubesql 83.68% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 1m 1s —— View job


Re-review after chore: review (52f9357)

One-line verdict: 0 high, 3 low still open — High and one Low resolved.

Expand for details

Addressed in 52f9357:

  • Highphysical_plan_builder/processors/logical_join.rs:99-110 now matches INNER/LEFT explicitly and returns CubeError::user("Unsupported join type ...") for anything else. Thread resolved.
  • Low — tests/integration/member_expressions.rs now exercises the LEFT path (test_subquery_join_grouped_left), uses insta::assert_snapshot! on all three sub-query-join tests, and adds test_subquery_join_unknown_join_type covering the RIGHT rejection. Thread resolved.

Still open from the prior review:

  • Low — cube_bridge/subquery_join.rs:11-21 — doc-comment describes shape but not the trust contract on alias/sql (alias must be pre-quoted, sql is emitted verbatim and must come from a trusted producer). A // SAFETY:-style note would prevent future re-introduction of quoting at the wrong layer.
  • Low — query_properties.rs:1173-1182PartialEq still excludes on_sql. The new doc-comment says on_sql is "derived deterministically from these" three fields, but that's not quite right — on_sql is compiled from the on member expression, so two entries with the same (sql, alias, join_type) and different on still compare equal. Either include on_sql or reword the comment to call this out as identity-not-semantic equality.
  • Low — planners/simple_query_planer.rs:83-108 — degenerate subqueryJoins with no cube reference in ON (empty hints) is still unguarded/untested.

No new findings introduced by this commit. Snapshot files look clean and match the verbatim-alias / verbatim-inner-SQL invariants.

· [Branch](https://github.com/cube-js/cube/tree/feat/tesseract-support-subquery-joins)

@ovr ovr force-pushed the feat/tesseract-support-subquery-joins branch from 52f9357 to 5f8da2b Compare June 22, 2026 17:45
@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 50s —— View job


Re-review (no new commits since 5f8da2b)

One-line verdict: no change — 3 low still open, all tracked in existing unresolved threads.

Details

HEAD is still 5f8da2b chore: review. No new commits have landed since the previous re-review, so findings are identical:

  • Low — cube_bridge/subquery_join.rs:21 — trust contract on alias (pre-quoted) / sql (verbatim) still undocumented on the trait. Tracked in unresolved thread.
  • Low — query_properties.rs:1180PartialEq still excludes on_sql; the doc-comment's "derived deterministically from these three" claim isn't quite right (on_sql is compiled from the on member expression, not from sql/alias/join_type). Tracked in unresolved thread.
  • Low — simple_query_planer.rs:83-108 — degenerate subqueryJoins with no cube reference in ON (empty hintsmake_join_logical_plan_with_join_hints(empty, …)) still unguarded/untested. Tracked in unresolved thread.

Re-affirmed the 3 prior threads still apply; no new inline comments needed. Resolved threads from prior rounds (high join_type validation, unused import, INNER-only tests) remain addressed.

· Branch ·

ovr added 4 commits June 23, 2026 12:22
The SQL API encodes grouped join push-downs (e.g.
`JOIN (SELECT ... GROUP BY ... ORDER BY ... LIMIT n) t ON ...`) as a
`subqueryJoins` entry on the query. Only the legacy planner consumed it
(via `customSubQueryJoins`); `buildSqlAndParamsRust` dropped it before
the native call and Tesseract had no way to receive it. As a result the
joined sub-query was omitted entirely: queries returned unfiltered rows
(the inner LIMIT/ORDER vanished) and projections referencing the
sub-query alias failed with "missing FROM-clause entry".

Thread `subqueryJoins` end to end:
- BaseQuery.js: pass `subqueryJoins` into both native queryParams.
- New `SubqueryJoin` bridge (sql/joinType/alias + `on` member expression).
- QueryPropertiesCompiler compiles each `on` into a `SqlCall`; stored on
  QueryProperties as `LogicalSubqueryJoinItem`s.
- SimpleQueryPlanner folds them into the query's LogicalJoin.
- Physical builder emits them as `SingleSource::RawSubquerySql` joins; the
  no-join fast path now also checks `subquery_joins` so alias-only
  projections still materialize the join.

The opaque sub-query SQL (incl. its ORDER BY/LIMIT) is emitted verbatim,
so top-N semantics are preserved. The SQL API sends the join alias
pre-quoted and references it verbatim in the ON condition, so the raw
sub-query source is emitted with its alias as-is (no re-quoting). Changes
are gated on a non-empty `subquery_joins`, leaving existing plans untouched.
@ovr ovr force-pushed the feat/tesseract-support-subquery-joins branch from 5f8da2b to bd3af59 Compare June 23, 2026 10:26
@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 53s —— View job


Re-review after chore: review (bd3af59)

One-line verdict: no new findings — 3 low still open, all tracked in existing unresolved threads.

Expand for details

New in bd3af59:

  • Whitespace-only changes around the subquery_joins loop in physical_plan_builder/processors/logical_join.rs — no semantic change.
  • multi_fact_join_groups.rs:1 — the JoinHintItem import is now genuinely used (referenced at multi_fact_join_groups.rs:256 via JoinHintItem::Single(cube_name)), so this is no longer a dead-import. Consistent with the prior resolved thread.
  • The three sub-query-join integration tests were lifted to #[tokio::test(flavor = "multi_thread")] and now optionally execute against pg via try_execute_pg_from_options — snapshots flipped from sql to result. Good upgrade: the snapshots now lock the actual top-N semantics (verbatim inner LIMIT 2 correctly returns 2 rows for INNER, 3 for LEFT against the seeded orders). The substring asserts on the SQL are retained so the alias-verbatim invariant is still guarded even when pg isn't available.

Still open from prior reviews:

  • Low — cube_bridge/subquery_join.rs:21 — trust contract on alias (pre-quoted) / sql (emitted verbatim) still undocumented on the trait.
  • Low — query_properties.rs:1186PartialEq still excludes on_sql; doc-comment's "derived deterministically" justification doesn't fit (on_sql is compiled from the on member expression, not from sql/alias/join_type).
  • Low — simple_query_planer.rs:83-108 — degenerate subqueryJoins with no cube reference in ON (empty hintsmake_join_logical_plan_with_join_hints(empty, …)) still unguarded/untested.

Re-affirmed the 3 prior threads still apply; no new inline comments needed.

· Branch ·

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 1m 0s —— View job


Re-review after chore: review (0f33a33)

One-line verdict: no new findings — degenerate-ON thread now resolved; 2 low still open, tracked in existing unresolved threads.

Expand for details
  • Reviewed the new commit
  • Re-checked open threads against current code
  • Resolved the simple_query_planer thread

Addressed in 0f33a33:

  • Low — simple_query_planer.rs:90-108 — degenerate subqueryJoins with no cube reference in ON now returns CubeError::user("Sub-query join requires its ON condition to reference at least one cube member …") instead of feeding empty hints into make_join_logical_plan_with_join_hints. New test test_subquery_join_no_cube_reference_in_on (tests/integration/member_expressions.rs:421-455) asserts the error message. Thread resolved.

Still open from prior reviews:

  • Low — cube_bridge/subquery_join.rs:11-21 — doc-comment now describes shape but still doesn't call out the trust contract: alias must be a fully-quoted identifier and sql is emitted verbatim and must come from a trusted producer. A // SAFETY:-style note would prevent re-introducing quoting at the wrong layer.
  • Low — query_properties.rs:1177-1185PartialEq still excludes on_sql. The "derived deterministically from these" comment doesn't quite fit because on_sql is compiled from the on member expression, not from sql/alias/join_type — two entries with the same triple but different on will compare equal. Either include on_sql or reword the comment to call this out as identity-not-semantic equality.

Re-affirmed the 2 prior threads still apply; no new inline comments needed. Resolved threads from prior rounds (high join_type validation, unused import, INNER-only tests, degenerate ON) remain addressed.

· Branch ·

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 54s —— View job


Re-review after chore: review (29ea4a6)

One-line verdict: no new findings — PartialEq thread now resolved; 1 low still open (trust-contract doc on SubqueryJoin).

Expand for details
  • Reviewed the new commit
  • Re-checked open threads against current code
  • Resolved the query_properties.rs PartialEq thread

Addressed in 29ea4a6:

  • Low — query_properties.rs:1177-1190PartialEq now includes on_sql via the new SqlCall::struct_eq (template + structural dep walk: Symbol by ==, CubeRef by (cube_name, path)). The doc-comment was reworded to "semantically … the request triple … plus the compiled ON condition" — accurate now. New test test_subquery_join_on_participates_in_equality asserts both directions (same ON → equal, different ON → not equal). Thread resolved.

    Minor note (not blocking): SqlCall::struct_eq treats mixed (Symbol, CubeRef) pairs as not-equal, which is the right default; if a future dep variant is added it'll silently mis-compare via the _ => false arm — worth revisiting if SqlDependency grows.

Still open from prior reviews:

  • Low — cube_bridge/subquery_join.rs:11-21 — doc-comment describes shape but still doesn't state the trust contract: alias must be a fully-quoted identifier and sql is emitted verbatim and must come from a trusted producer. A // SAFETY:-style note would prevent re-introducing quoting at the wrong layer.

Re-affirmed the 1 remaining prior thread still applies; no new inline comments needed. Resolved threads from prior rounds (high join_type validation, unused import, INNER-only tests, degenerate ON, PartialEq) remain addressed.

· Branch ·

@ovr ovr merged commit 4d4b8cd into master Jun 23, 2026
92 of 94 checks passed
@ovr ovr deleted the feat/tesseract-support-subquery-joins branch June 23, 2026 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

javascript Pull requests that update Javascript code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants