Skip to content

26.1 Antalya port - Alternative syntax for cluster functions#1390

Open
ianton-ru wants to merge 17 commits intoantalya-26.1from
frontport/antalya-26.1/alternative_syntax
Open

26.1 Antalya port - Alternative syntax for cluster functions#1390
ianton-ru wants to merge 17 commits intoantalya-26.1from
frontport/antalya-26.1/alternative_syntax

Conversation

@ianton-ru
Copy link

@ianton-ru ianton-ru commented Feb 11, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Frontports for Antalya 26.1

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Feb 11, 2026

Workflow [PR], commit [f3c7d36]

@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5b1756a56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

auto arguments = make_intrusive<ASTExpressionList>();

arguments->children.push_back(make_intrusive<ASTLiteral>(connection_params.endpoint.storage_account_url));
arguments->children.push_back(make_intrusive<ASTIdentifier>(connection_params.endpoint.container_name));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit Azure container argument as a string literal

When StorageObjectStorageCluster rewrites Azure table-engine reads into table-function calls (for object_storage_cluster/cluster-mode execution), this argument builder emits the container as ASTIdentifier instead of ASTLiteral. Azure table-function parsing expects a literal container name, so distributed reads can fail with argument/identifier errors on remote nodes. This should serialize container_name as a literal string to keep rewritten queries valid.

Useful? React with 👍 / 👎.


updateBeforeRead(context);
auto cluster = getCluster(context);
auto cluster = getClusterImpl(context, cluster_name_from_settings, context->getSettingsRef()[Setting::object_storage_max_nodes]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit host capping to object-storage cluster engines only

This unconditionally applies object_storage_max_nodes in base IStorageCluster::read, so non-object-storage engines that also inherit IStorageCluster (notably StorageURLCluster and StorageFileCluster) will also have shards silently dropped when the setting is non-zero. That can return partial query results for urlCluster/fileCluster even though the setting is documented for object-storage cluster functions.

Useful? React with 👍 / 👎.

Comment on lines +445 to +449
auto p = function_to_cluster_function.find(table_function->name);
if (p == function_to_cluster_function.end())
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle deltaLakeLocal in cluster-function rewrite

In cluster mode for table engines, DeltaLakeLocal is rewritten to table function deltaLakeLocal, but this conversion path only maps functions that have *Cluster equivalents and then throws LOGICAL_ERROR when none is found. As a result, reads of ENGINE=DeltaLakeLocal with non-empty object_storage_cluster fail with an internal error instead of a supported path or a proper user-facing validation error.

Useful? React with 👍 / 👎.

@ianton-ru ianton-ru added antalya antalya-26.1 port-antalya PRs to be ported to all new Antalya releases labels Feb 13, 2026
@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6048e5d947

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


#if USE_AVRO
using TableFunctionIcebergCluster = TableFunctionObjectStorageCluster<IcebergClusterDefinition, StorageIcebergConfiguration, true>;
using TableFunctionIcebergLocalCluster = TableFunctionObjectStorageCluster<IcebergLocalClusterDefinition, StorageLocalIcebergConfiguration, true>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove duplicate IcebergLocal cluster type alias

This header now declares TableFunctionIcebergLocalCluster twice under the same #if USE_AVRO guard, which creates a redefinition in AVRO-enabled builds and breaks compilation of the table-function targets. Keeping a single alias declaration avoids a hard build failure for Iceberg-capable configurations.

Useful? React with 👍 / 👎.

Comment on lines +331 to +333
auto p = engine_to_function.find(storage_engine_name);
if (p == engine_to_function.end())
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add S3-alias mappings before throwing on engine rewrite

This lookup throws when storage_engine_name is not in engine_to_function, but the rewrite path is also used for GCS, COSN, and OSS engines (their storage engine names are defined separately in StorageObjectStorageDefinitions.h). With object_storage_cluster enabled, queries against those engines now fail with LOGICAL_ERROR instead of executing, so these aliases need explicit mapping or a fallback in this conversion step.

Useful? React with 👍 / 👎.

{
chassert(result.count == 0); /// We shouldn't use replacement with masking other arguments
result.start = url_arg_idx;
result.start = function->arguments->getRealIndex(url_arg_idx);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align SAS masking index with skipped-argument offsets

After introducing skipped arguments (used for iceberg(..., storage_type=...)), Azure secret masking must always map back to original argument positions; this branch was updated for AccountKey, but the SharedAccessSignature branch below still writes a relative index. In queries like iceberg(storage_type='azure', '...SharedAccessSignature=...'), that mismatch masks the wrong argument and can leave the SAS token unredacted.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f3c7d36056

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +420 to +423
auto args_copy = args;
// Remove cluster name from args to avoid confusing cluster name and named collection name
args_copy.erase(args_copy.begin());
configuration->extractDynamicStorageType(args_copy, context, &object_storage_type_arg);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Extract dynamic storage_type from live args

For old *Cluster syntax (for example icebergCluster(..., storage_type='azure')), this code extracts storage_type from args_copy instead of args, so the original argument remains in args and then gets appended again via object_storage_type_arg later in updateQueryToSendIfNeeded. The forwarded query can end up with two storage_type key-value arguments, and worker-side parsing rejects it with the duplicate-parameter error, which breaks distributed reads that use explicit storage_type.

Useful? React with 👍 / 👎.

assert int(hosts_engine_distributed) == 3


def test_distributed_s3_table_engine(started_cluster):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Remove duplicate test_distributed_s3_table_engine

This module defines test_distributed_s3_table_engine twice; in Python the later definition overwrites the earlier one at import time, so one full test body is never executed. That silently drops intended coverage and can hide regressions in this path while CI still reports green.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants