Skip to content

Add hooks needed for cudf-exchange version of velox.#2

Open
dan13bauer wants to merge 4 commits into
cudf_parquet_readerfrom
exchange_hooks
Open

Add hooks needed for cudf-exchange version of velox.#2
dan13bauer wants to merge 4 commits into
cudf_parquet_readerfrom
exchange_hooks

Conversation

@dan13bauer
Copy link
Copy Markdown
Owner

No description provided.

@GregoryKimball GregoryKimball removed this from libcudf Oct 17, 2025
dan13bauer pushed a commit that referenced this pull request Mar 23, 2026
)

## Summary:
This reverts the following PR
GitHub Pull Request: prestodb#27199
GitHub Author: Deepak Mehra mehradeeeepak@gmail.com
Meta internal revision:D96106074

Fully reverts the HAVING alias expansion feature, which threw
7 failed queries (2 different errors) on meta internal test suites. I
have verified that these errors are all caused by this change by
attempting to fix them and testing that the revert clears the errors.

## Errors:
1. java.sql.SQLException: Query failed (#20260316_083518_04833_pwriw):
type of variable 'expr_189' is expected 2. to be varchar(25), but the
actual type is varchar
java.sql.SQLException: Query failed (#20260316_083713_05663_pwriw):
Cannot nest aggregations inside aggregation 'sum':
["sum"(CTX_pe_revenue)]

## Root cause:
The PR added SELECT alias resolution in HAVING by rewriting
the HAVING predicate with OrderByExpressionRewriter. This caused two
bugs:
1. Type mismatches (5 failures): (AI reasoning) When a column name
matches a SELECT alias
(e.g., SELECT CASE...END AS category), the rewriter expands the column
reference in HAVING to the full CASE expression, changing its type from
varchar(25) to unbounded varchar. QueryPlanner.filter() uses the
rewritten expression via analysis.getHaving(), creating plan nodes with
mismatched types that TypeValidator rejects.
2. Nested aggregation (1 failure): (confirmed) When a SELECT alias
references an
aggregation and HAVING uses that alias inside another aggregation
(e.g., HAVING sum(total) where total = sum(x)), expansion creates
invalid nested aggregations sum(sum(x))
3. bigint vs double (1 failure): Same mechanism as issue 1 but with
numeric
type coercion differences.

## Fix:
I tried to fix these issues, but only managed to clear the error for #2.
Reverting the change due to these errors.
According to AI:
The HAVING alias resolution feature needs a different implementation
that
doesn't change expression types in the plan — likely by resolving
aliases
during planning rather than during analysis.

## Reproduction:
I can't share the reproducing queries since they reference meta internal
data, but these should give you an idea of how to reproduce the issues

Bug 1: Type mismatch (varchar(25) vs varchar)

```
  -- The SELECT aliases `category` to a CASE expression returning varchar strings.
  -- HAVING references `category` which the rewriter expands to the CASE expression,
  -- changing its type from the subquery's varchar(25) to unbounded varchar.
  -- Triggers: "type of variable 'expr_N' is expected to be varchar(25), but the actual type is varchar"

  WITH data AS (
      SELECT
          CAST('cpu' AS varchar(25)) AS category,
          true AS is_cpu,
          100.0 AS value
      UNION ALL
      SELECT CAST('gpu' AS varchar(25)), false, 200.0
  )
  SELECT
      (CASE
          WHEN COALESCE(category, CAST(is_cpu AS varchar)) IS NULL THEN 'total'
          WHEN is_cpu THEN 'cpu_total'
          ELSE category
      END) category,
      sum(value) AS total_value
  FROM data
  GROUP BY GROUPING SETS ((), (category), (is_cpu))
  HAVING (CASE
      WHEN COALESCE(category, CAST(is_cpu AS varchar)) IS NULL THEN 'total'
      WHEN is_cpu THEN 'cpu_total'
      ELSE category
  END) IS NOT NULL
```

  Bug 2: Nested aggregation (SYNTAX_ERROR)

```
  -- SELECT defines `total` as sum(revenue). HAVING uses sum(total).
  -- The rewriter expands `total` to `sum(revenue)`, creating sum(sum(revenue)).
  -- Triggers: "Cannot nest aggregations inside aggregation 'sum': ["sum"(revenue)]"

  SELECT
      region,
      sum(revenue) AS total
  FROM (
      VALUES ('us', 100), ('eu', 200), ('us', 150), ('eu', 50)
  ) AS t(region, revenue)
  GROUP BY region
  HAVING sum(total) > 100
```

Differential Revision: D97181706

## Summary by Sourcery

Revert support for resolving SELECT output aliases in HAVING clauses and
restore previous validation behavior.

Enhancements:
- Simplify OrderByExpressionRewriter by removing clause-specific error
context now that it is only used for ORDER BY.

Tests:
- Update analyzer tests to again treat alias references in HAVING as
invalid and remove related positive and error-coverage cases.


## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
* Fix 2 bugs caused by Select Alias references in Having clause

Co-authored-by: mehradeeeepak@gmail.com <mehradeeeepak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants