Conversation
cf655d7 to
3ce26eb
Compare
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
3ce26eb to
636ae07
Compare
There was a problem hiding this comment.
Pull request overview
Adds multi-bucket split sharding to Quickwit indexes by introducing extra_index_uris and persisting the chosen bucket per split (SplitMetadata.storage_uri) so search, merge, GC, and tooling can always resolve the correct storage location.
Changes:
- Add
extra_index_uristo index config/template + metastore update flow, and persist per-splitstorage_uriwith a fallback helper (effective_storage_uri). - Update indexing, merge, search/list APIs, CLI, janitor, and garbage collection to read/write/delete splits using the per-split effective storage URI.
- Add round-robin bucket selection and an end-to-end integration test + docs updates.
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| quickwit/quickwit-serve/src/lib.rs | Treat indexes as file-backed if any configured index URI (primary or extra) uses file/ram storage. |
| quickwit/quickwit-search/src/search_job_placer.rs | Group jobs by (index_uid, storage_uri); refactor grouping helper to comparator-based API. |
| quickwit/quickwit-search/src/root.rs | Carry per-split storage_uri through search + fetch-docs job paths and leaf request building. |
| quickwit/quickwit-search/src/list_terms.rs | Route list-terms leaf requests per (index_uid, storage_uri) group. |
| quickwit/quickwit-search/src/list_fields.rs | Route list-fields leaf requests per (index_uid, storage_uri) group. |
| quickwit/quickwit-proto/src/codegen/quickwit/quickwit.metastore.rs | Add extra_index_uris field to UpdateIndexRequest codegen. |
| quickwit/quickwit-proto/protos/quickwit/metastore.proto | Add extra_index_uris to metastore UpdateIndexRequest proto. |
| quickwit/quickwit-metastore/src/tests/index.rs | Update metastore update-index tests to pass extra_index_uris. |
| quickwit/quickwit-metastore/src/split_metadata_version.rs | Extend split metadata v0.8 serialization with optional storage_uri. |
| quickwit/quickwit-metastore/src/split_metadata.rs | Add storage_uri to SplitMetadata + effective_storage_uri helper. |
| quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs | Deserialize and apply extra_index_uris during update-index. |
| quickwit/quickwit-metastore/src/metastore/mod.rs | Add (de)serialization support for extra_index_uris in UpdateIndexRequestExt. |
| quickwit/quickwit-metastore/src/metastore/index_metadata/mod.rs | Persist extra_index_uris updates in index metadata; add unit test. |
| quickwit/quickwit-metastore/src/metastore/file_backed/mod.rs | Deserialize and apply extra_index_uris during update-index. |
| quickwit/quickwit-metastore/src/metastore/file_backed/file_backed_index/mod.rs | Thread extra_index_uris through file-backed index config updates. |
| quickwit/quickwit-janitor/src/actors/garbage_collector.rs | Pass storage resolver through GC plumbing; adjust mocks. |
| quickwit/quickwit-janitor/src/actors/delete_task_service.rs | Build an IndexingSplitStore with multiple storages + selector for delete pipeline. |
| quickwit/quickwit-janitor/src/actors/delete_task_planner.rs | Build SearchJob using split effective storage URI. |
| quickwit/quickwit-janitor/src/actors/delete_task_pipeline.rs | Use IndexingSplitStore instead of a single Storage in delete pipeline. |
| quickwit/quickwit-integration-tests/src/tests/multi_bucket_tests.rs | New end-to-end integration test covering multi-bucket ingest + search. |
| quickwit/quickwit-integration-tests/src/tests/mod.rs | Register the new multi-bucket test module. |
| quickwit/quickwit-indexing/src/split_store/mod.rs | Export bucket selector API. |
| quickwit/quickwit-indexing/src/split_store/indexing_split_store.rs | Support multiple storages + per-split read/write routing using effective URI. |
| quickwit/quickwit-indexing/src/split_store/bucket_selector.rs | New round-robin bucket selector + tests. |
| quickwit/quickwit-indexing/src/models/split_attrs.rs | Initialize new SplitMetadata.storage_uri field. |
| quickwit/quickwit-indexing/src/mature_merge.rs | Resolve all configured storages and write merged outputs via selector. |
| quickwit/quickwit-indexing/src/lib.rs | Re-export split-store cache and selector helpers. |
| quickwit/quickwit-indexing/src/actors/uploader.rs | Select bucket per new split and persist SplitMetadata.storage_uri. |
| quickwit/quickwit-indexing/src/actors/merge_split_downloader.rs | Fetch splits using split metadata (effective storage URI). |
| quickwit/quickwit-indexing/src/actors/indexing_service.rs | Build multi-storage IndexingSplitStore for indexing pipelines. |
| quickwit/quickwit-indexing/src/actors/indexing_pipeline.rs | Remove direct Storage from params; rely on IndexingSplitStore. |
| quickwit/quickwit-index-management/src/index.rs | Validate connectivity for extra storages; pass resolver into GC flows. |
| quickwit/quickwit-index-management/src/garbage_collection.rs | Group deletions per effective storage URI; resolve per-bucket storage for bulk delete. |
| quickwit/quickwit-config/src/index_template/serialize.rs | Add extra_index_uris to index template (de)serialization. |
| quickwit/quickwit-config/src/index_template/mod.rs | Add extra_index_uris to templates + validation; propagate into index configs. |
| quickwit/quickwit-config/src/index_config/serialize.rs | Add extra_index_uris to index config schema; enforce “no removals” on update. |
| quickwit/quickwit-config/src/index_config/mod.rs | Add extra_index_uris field + helper all_index_uris; include in fingerprinting. |
| quickwit/quickwit-common/src/uri.rs | Add ordering to Protocol and Uri to enable grouping/sorting by URI. |
| quickwit/quickwit-cli/src/tool.rs | Resolve the correct storage URI for a specific split when extracting. |
| quickwit/quickwit-cli/src/lib.rs | Checklist now validates connectivity for extra index storages too. |
| docs/reference/rest-api.md | Document extra_index_uris in create/update index REST payloads. |
| docs/configuration/storage-config.md | Mention extra_index_uris as another place storage URIs can be used. |
| docs/configuration/index-config.md | Document extra_index_uris and the multi-bucket split sharding behavior/caveat. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
There was a problem hiding this comment.
Pull request overview
This PR introduces multi-bucket split sharding for a single index by adding extra_index_uris, persisting the chosen bucket per split (SplitMetadata.storage_uri), and plumbing that URI through indexing, search, merge, and garbage collection so reads/deletes work even as the configured URI list evolves.
Changes:
- Add
extra_index_uristo index config/template + metastore update APIs, with validation and backward-compatible serialization. - Persist per-split
storage_uriand route split upload/download/merge/GC to the correct bucket usingeffective_storage_uri(). - Update distributed search/list-* fanout to group leaf requests by
(index_uid, storage_uri); add end-to-end integration test and docs.
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| quickwit/quickwit-serve/src/lib.rs | Treat indexes as file-backed if any configured URI is file/ram. |
| quickwit/quickwit-search/src/search_job_placer.rs | Grouping helpers updated to group jobs by (index_uid, storage_uri). |
| quickwit/quickwit-search/src/root.rs | Carry per-split storage_uri in jobs; group leaf requests by (index_uid, storage_uri). |
| quickwit/quickwit-search/src/list_terms.rs | List-terms leaf requests now routed per (index_uid, storage_uri) group. |
| quickwit/quickwit-search/src/list_fields.rs | List-fields leaf requests now routed per (index_uid, storage_uri) group. |
| quickwit/quickwit-proto/protos/quickwit/metastore.proto | Add extra_index_uris to UpdateIndexRequest. |
| quickwit/quickwit-proto/src/codegen/quickwit/quickwit.metastore.rs | Regenerated prost code for extra_index_uris. |
| quickwit/quickwit-metastore/src/tests/index.rs | Update metastore update tests to pass extra_index_uris. |
| quickwit/quickwit-metastore/src/split_metadata_version.rs | Add storage_uri to v0.8 split metadata serialization. |
| quickwit/quickwit-metastore/src/split_metadata.rs | Add storage_uri + effective_storage_uri() helper. |
| quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs | Deserialize and apply extra_index_uris on update. |
| quickwit/quickwit-metastore/src/metastore/mod.rs | Add (de)serialization helpers for extra_index_uris in update requests. |
| quickwit/quickwit-metastore/src/metastore/index_metadata/mod.rs | Persist extra_index_uris changes in index metadata updates + unit test. |
| quickwit/quickwit-metastore/src/metastore/file_backed/mod.rs | Deserialize and apply extra_index_uris on update (file-backed). |
| quickwit/quickwit-metastore/src/metastore/file_backed/file_backed_index/mod.rs | Thread extra_index_uris through file-backed index update path. |
| quickwit/quickwit-janitor/src/actors/garbage_collector.rs | Pass StorageResolver so GC can delete splits per effective URI. |
| quickwit/quickwit-janitor/src/actors/delete_task_service.rs | Build IndexingSplitStore with multiple storages + bucket selector. |
| quickwit/quickwit-janitor/src/actors/delete_task_planner.rs | Create SearchJob using effective_storage_uri. |
| quickwit/quickwit-janitor/src/actors/delete_task_pipeline.rs | Pipeline now owns/clones IndexingSplitStore instead of raw storage. |
| quickwit/quickwit-integration-tests/src/tests/multi_bucket_tests.rs | New e2e test verifying multi-bucket distribution + search correctness. |
| quickwit/quickwit-integration-tests/src/tests/mod.rs | Register new integration test module. |
| quickwit/quickwit-indexing/src/split_store/mod.rs | Export bucket selector APIs. |
| quickwit/quickwit-indexing/src/split_store/bucket_selector.rs | New round-robin BucketSelector implementation. |
| quickwit/quickwit-indexing/src/split_store/indexing_split_store.rs | Split store now manages multiple storages and selects bucket per write; reads use effective_storage_uri. |
| quickwit/quickwit-indexing/src/models/split_attrs.rs | Initialize new splits with storage_uri: None (set later by uploader). |
| quickwit/quickwit-indexing/src/mature_merge.rs | Resolve all storages and use IndexingSplitStore for multi-bucket merges. |
| quickwit/quickwit-indexing/src/lib.rs | Re-export IndexingSplitCache and bucket selector helpers. |
| quickwit/quickwit-indexing/src/actors/uploader.rs | Select target bucket for each new split and persist storage_uri. |
| quickwit/quickwit-indexing/src/actors/merge_split_downloader.rs | Download split using full SplitMetadata (bucket-aware). |
| quickwit/quickwit-indexing/src/actors/indexing_service.rs | Construct multi-storage IndexingSplitStore for pipelines; update fingerprint tests. |
| quickwit/quickwit-indexing/src/actors/indexing_pipeline.rs | Remove redundant storage param; rely on IndexingSplitStore. |
| quickwit/quickwit-index-management/src/index.rs | Validate extra storages on update; pass resolver to GC/deletion paths. |
| quickwit/quickwit-index-management/src/garbage_collection.rs | GC deletes grouped by effective URI; resolve per-bucket storages. |
| quickwit/quickwit-config/src/index_template/serialize.rs | Add extra_index_uris to template serialization. |
| quickwit/quickwit-config/src/index_template/mod.rs | Add extra_index_uris to templates + validation and tests. |
| quickwit/quickwit-config/src/index_config/serialize.rs | Add config validation/update constraints for extra_index_uris. |
| quickwit/quickwit-config/src/index_config/mod.rs | Add extra_index_uris, include in fingerprint, and add all_index_uris(). |
| quickwit/quickwit-common/src/uri.rs | Add ordering support needed for grouping by Uri. |
| quickwit/quickwit-cli/src/tool.rs | extract-split now resolves the split’s effective storage URI. |
| quickwit/quickwit-cli/src/lib.rs | Index checklist checks connectivity for extra storages. |
| docs/reference/rest-api.md | Document extra_index_uris for create/update index APIs. |
| docs/configuration/storage-config.md | Mention extra_index_uris as a place storage URIs are used. |
| docs/configuration/index-config.md | Document extra_index_uris and multi-bucket split sharding behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Splits can now be distributed across multiple storage buckets for a single index. A new
extra_index_urisconfiguration option allows specifying additional storage URIs alongside the existingindex_uri. New splits are written to buckets using a round-robin strategy, and each split records which bucket it was stored in so that reads, merges, and garbage collection work correctly regardless of how the list evolves over time.Motivation
Previously, all splits for an index were stored under a single
index_uri. This change enables spreading data across multiple buckets for improved write throughput, storage isolation, or operational flexibility.Configuration
index_uriremains required and acts as the primary storage location.extra_index_urisis optional (defaults to empty — fully backward compatible).How it works
IndexingSplitStoreholds all resolved storages and aBucketSelector(round-robin by default). Each new split is assigned a target bucket before staging. The chosen URI is persisted inSplitMetadata.storage_uri.SearchJobandFetchDocsJobcarry the per-split storage URI. Leaf requests are grouped by(index_uid, storage_uri)so splits in different buckets get separate requests. No proto changes were needed.fetch_and_open_splittakes&SplitMetadataand resolves the correct bucket viaeffective_storage_uri(). Merged output splits are assigned a bucket by the selector.storage_uri: Noneand continue to be read fromindex_uri. No database migration is required — the field lives inside the existingsplit_metadata_jsoncolumn.Breaking changes
extra_index_uriscannot be read by older Quickwit versions (the field is omitted from serialized JSON when empty, so indexes not using the feature are unaffected).