Antalya 26.1 - Forward port of list objects cache #1040#1405
Antalya 26.1 - Forward port of list objects cache #1040#1405arthurpassos wants to merge 6 commits intoantalya-26.1from
Conversation
…ache Antalya 25.8 - Forward port of #805 List objects cache
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5bb0d4833c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (local_context->getSettingsRef()[Setting::use_object_storage_list_objects_cache] && object_storage->supportsListObjectsCache()) | ||
| { | ||
| auto & cache = ObjectStorageListObjectsCache::instance(); | ||
| ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix())}; |
There was a problem hiding this comment.
Scope list-objects cache entries to auth identity
StorageObjectStorageSource::createFileIterator builds the cache key from only description/namespace/prefix, so entries are shared across sessions that use different object-store credentials for the same endpoint+bucket+prefix. In S3, getDescription() is endpoint-based, so a query can reuse another credential's cached object list, which can surface object names and trigger read failures on objects the current credentials cannot access. Include credential/user identity in the key (or explicitly disable cache for per-query credentials) to avoid cross-credential cache pollution.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
IIRC, we did discuss this and came to the conclusion it was ok.
| auto & cache = ObjectStorageListObjectsCache::instance(); | ||
| ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix())}; | ||
|
|
||
| if (auto objects_info = cache.get(cache_key, /*filter_by_prefix=*/ false)) |
There was a problem hiding this comment.
Differentiate list cache entries by tag-fetch mode
The cache key does not include with_tags, so a cached listing produced with with_tags=false can be reused for a later _tags query that needs tag metadata. On a cache hit, createReader skips metadata refetch because each ObjectInfo already has metadata, so _tags can be silently empty/wrong even though the query requested tags. Add with_tags (or an equivalent metadata-completeness flag) to the cache key.
Useful? React with 👍 / 👎.
| const std::shared_ptr<Value> & value) | ||
| { | ||
| auto key_with_ttl = key; | ||
| key_with_ttl.expires_at = std::chrono::steady_clock::now() + std::chrono::seconds(ttl_in_seconds); |
There was a problem hiding this comment.
Honor zero TTL as non-expiring list cache entries
The server setting says TTL 0 means unlimited, but ObjectStorageListObjectsCache::set always writes expires_at = now + seconds(ttl_in_seconds). With ttl_in_seconds == 0, entries expire immediately and are treated as stale on subsequent lookups, effectively disabling caching instead of making entries non-expiring. Handle 0 as a special case (e.g., max time point) to match the documented setting behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed - I am ashamed, thanks machine
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Forward port of list objects cache #1040
Documentation entry for user-facing changes
Cache for listobjects calls
CI/CD Options
Exclude tests:
Regression jobs to run: