26.1 Antalya port - Alternative syntax for cluster functions#1390
26.1 Antalya port - Alternative syntax for cluster functions#1390ianton-ru wants to merge 17 commits intoantalya-26.1from
Conversation
…ocal_cluster icebergLocalCluster table function
…ge_blocks Fix prefersLargeBlocks for cluster storage
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5b1756a56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| auto arguments = make_intrusive<ASTExpressionList>(); | ||
|
|
||
| arguments->children.push_back(make_intrusive<ASTLiteral>(connection_params.endpoint.storage_account_url)); | ||
| arguments->children.push_back(make_intrusive<ASTIdentifier>(connection_params.endpoint.container_name)); |
There was a problem hiding this comment.
Emit Azure container argument as a string literal
When StorageObjectStorageCluster rewrites Azure table-engine reads into table-function calls (for object_storage_cluster/cluster-mode execution), this argument builder emits the container as ASTIdentifier instead of ASTLiteral. Azure table-function parsing expects a literal container name, so distributed reads can fail with argument/identifier errors on remote nodes. This should serialize container_name as a literal string to keep rewritten queries valid.
Useful? React with 👍 / 👎.
src/Storages/IStorageCluster.cpp
Outdated
|
|
||
| updateBeforeRead(context); | ||
| auto cluster = getCluster(context); | ||
| auto cluster = getClusterImpl(context, cluster_name_from_settings, context->getSettingsRef()[Setting::object_storage_max_nodes]); |
There was a problem hiding this comment.
Limit host capping to object-storage cluster engines only
This unconditionally applies object_storage_max_nodes in base IStorageCluster::read, so non-object-storage engines that also inherit IStorageCluster (notably StorageURLCluster and StorageFileCluster) will also have shards silently dropped when the setting is non-zero. That can return partial query results for urlCluster/fileCluster even though the setting is documented for object-storage cluster functions.
Useful? React with 👍 / 👎.
| auto p = function_to_cluster_function.find(table_function->name); | ||
| if (p == function_to_cluster_function.end()) | ||
| { | ||
| throw Exception( | ||
| ErrorCodes::LOGICAL_ERROR, |
There was a problem hiding this comment.
Handle deltaLakeLocal in cluster-function rewrite
In cluster mode for table engines, DeltaLakeLocal is rewritten to table function deltaLakeLocal, but this conversion path only maps functions that have *Cluster equivalents and then throws LOGICAL_ERROR when none is found. As a result, reads of ENGINE=DeltaLakeLocal with non-empty object_storage_cluster fail with an internal error instead of a supported path or a proper user-facing validation error.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6048e5d947
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| #if USE_AVRO | ||
| using TableFunctionIcebergCluster = TableFunctionObjectStorageCluster<IcebergClusterDefinition, StorageIcebergConfiguration, true>; | ||
| using TableFunctionIcebergLocalCluster = TableFunctionObjectStorageCluster<IcebergLocalClusterDefinition, StorageLocalIcebergConfiguration, true>; |
There was a problem hiding this comment.
Remove duplicate IcebergLocal cluster type alias
This header now declares TableFunctionIcebergLocalCluster twice under the same #if USE_AVRO guard, which creates a redefinition in AVRO-enabled builds and breaks compilation of the table-function targets. Keeping a single alias declaration avoids a hard build failure for Iceberg-capable configurations.
Useful? React with 👍 / 👎.
| auto p = engine_to_function.find(storage_engine_name); | ||
| if (p == engine_to_function.end()) | ||
| { |
There was a problem hiding this comment.
Add S3-alias mappings before throwing on engine rewrite
This lookup throws when storage_engine_name is not in engine_to_function, but the rewrite path is also used for GCS, COSN, and OSS engines (their storage engine names are defined separately in StorageObjectStorageDefinitions.h). With object_storage_cluster enabled, queries against those engines now fail with LOGICAL_ERROR instead of executing, so these aliases need explicit mapping or a fallback in this conversion step.
Useful? React with 👍 / 👎.
| { | ||
| chassert(result.count == 0); /// We shouldn't use replacement with masking other arguments | ||
| result.start = url_arg_idx; | ||
| result.start = function->arguments->getRealIndex(url_arg_idx); |
There was a problem hiding this comment.
Align SAS masking index with skipped-argument offsets
After introducing skipped arguments (used for iceberg(..., storage_type=...)), Azure secret masking must always map back to original argument positions; this branch was updated for AccountKey, but the SharedAccessSignature branch below still writes a relative index. In queries like iceberg(storage_type='azure', '...SharedAccessSignature=...'), that mismatch masks the wrong argument and can leave the SAS token unredacted.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f3c7d36056
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| auto args_copy = args; | ||
| // Remove cluster name from args to avoid confusing cluster name and named collection name | ||
| args_copy.erase(args_copy.begin()); | ||
| configuration->extractDynamicStorageType(args_copy, context, &object_storage_type_arg); |
There was a problem hiding this comment.
Extract dynamic storage_type from live args
For old *Cluster syntax (for example icebergCluster(..., storage_type='azure')), this code extracts storage_type from args_copy instead of args, so the original argument remains in args and then gets appended again via object_storage_type_arg later in updateQueryToSendIfNeeded. The forwarded query can end up with two storage_type key-value arguments, and worker-side parsing rejects it with the duplicate-parameter error, which breaks distributed reads that use explicit storage_type.
Useful? React with 👍 / 👎.
| assert int(hosts_engine_distributed) == 3 | ||
|
|
||
|
|
||
| def test_distributed_s3_table_engine(started_cluster): |
There was a problem hiding this comment.
Remove duplicate test_distributed_s3_table_engine
This module defines test_distributed_s3_table_engine twice; in Python the later definition overwrites the earlier one at import time, so one full test body is never executed. That silently drops intended coverage and can hide regressions in this path while CI still reports green.
Useful? React with 👍 / 👎.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Frontports for Antalya 26.1
CI/CD Options
Exclude tests:
Regression jobs to run: