Skip to content

Antalya 26.1 - Forward port of list objects cache #1040#1405

Open
arthurpassos wants to merge 6 commits intoantalya-26.1from
fp_antalya_26_1_list_objects_cache
Open

Antalya 26.1 - Forward port of list objects cache #1040#1405
arthurpassos wants to merge 6 commits intoantalya-26.1from
fp_antalya_26_1_list_objects_cache

Conversation

@arthurpassos
Copy link
Collaborator

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Forward port of list objects cache #1040

Documentation entry for user-facing changes

Cache for listobjects calls

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

…ache

Antalya 25.8 - Forward port of #805 List objects cache
@arthurpassos arthurpassos added port-antalya PRs to be ported to all new Antalya releases port-forward Needs to be ported to every future minor release of this major version antalya-26.1 labels Feb 14, 2026
@github-actions
Copy link

github-actions bot commented Feb 14, 2026

Workflow [PR], commit [988de0d]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5bb0d4833c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (local_context->getSettingsRef()[Setting::use_object_storage_list_objects_cache] && object_storage->supportsListObjectsCache())
{
auto & cache = ObjectStorageListObjectsCache::instance();
ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix())};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope list-objects cache entries to auth identity

StorageObjectStorageSource::createFileIterator builds the cache key from only description/namespace/prefix, so entries are shared across sessions that use different object-store credentials for the same endpoint+bucket+prefix. In S3, getDescription() is endpoint-based, so a query can reuse another credential's cached object list, which can surface object names and trigger read failures on objects the current credentials cannot access. Include credential/user identity in the key (or explicitly disable cache for per-query credentials) to avoid cross-credential cache pollution.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we did discuss this and came to the conclusion it was ok.

auto & cache = ObjectStorageListObjectsCache::instance();
ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix())};

if (auto objects_info = cache.get(cache_key, /*filter_by_prefix=*/ false))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Differentiate list cache entries by tag-fetch mode

The cache key does not include with_tags, so a cached listing produced with with_tags=false can be reused for a later _tags query that needs tag metadata. On a cache hit, createReader skips metadata refetch because each ObjectInfo already has metadata, so _tags can be silently empty/wrong even though the query requested tags. Add with_tags (or an equivalent metadata-completeness flag) to the cache key.

Useful? React with 👍 / 👎.

const std::shared_ptr<Value> & value)
{
auto key_with_ttl = key;
key_with_ttl.expires_at = std::chrono::steady_clock::now() + std::chrono::seconds(ttl_in_seconds);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor zero TTL as non-expiring list cache entries

The server setting says TTL 0 means unlimited, but ObjectStorageListObjectsCache::set always writes expires_at = now + seconds(ttl_in_seconds). With ttl_in_seconds == 0, entries expire immediately and are treated as stale on subsequent lookups, effectively disabling caching instead of making entries non-expiring. Handle 0 as a special case (e.g., max time point) to match the documented setting behavior.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - I am ashamed, thanks machine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya-26.1 port-antalya PRs to be ported to all new Antalya releases port-forward Needs to be ported to every future minor release of this major version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments