Fix consolidator crash on vacuumed delete commits. (#5771)#5772
Open
cl-earthscope wants to merge 6 commits intoTileDB-Inc:mainfrom
Open
Fix consolidator crash on vacuumed delete commits. (#5771)#5772cl-earthscope wants to merge 6 commits intoTileDB-Inc:mainfrom
cl-earthscope wants to merge 6 commits intoTileDB-Inc:mainfrom
Conversation
The previous implementation of the commits consolidator assumed that `.del` files always exist on disk as standalone files. If a `.del` file was consolidated and subsequently vacuumed, the consolidator threw a fatal non-retrievable error when trying to read its file size and payload during subsequent consolidation runs. Additionally, the `ArrayDirectory` string-based verification for vacuuming superseded `.con` files failed when it encountered embedded binary `.del` payloads, preventing those older `.con` files from ever being vacuumed. This PR resolves these issues by making the engine fully aware of physical payload locations: * `ArrayDirectory` now tracks the exact physical URI and byte offset for `.del` payloads (whether raw on disk or embedded in older `.con` files) inside `delete_and_update_tiles_location_`. * Added a `skip_delete_payload` helper in `ArrayDirectory` to properly advance the stream past binary data during `.con` file string verification. * `Consolidator::write_consolidated_commits_file` now uses the `location_map` to extract `.del` payloads from their actual physical locations instead of only querying the VFS using the logical URI. * Fixes the engine crash when consolidating an array with previously vacuumed delete commits. * Allows superseded `.con` files containing delete payloads to be successfully verified and vacuumed. The most significant changes are in `tiledb/sm/array/array_directory.cc` (parsing and offset tracking) and `tiledb/sm/consolidator/consolidator.cc` (location-aware reading). Explicitly added a check to ensure `.wrt` files are not added to the payload location map, as they are zero-byte markers. Added two unit tests (`[array-directory][commits-mode-del]` and `[array-directory][vacuum-binary-skip]`) to verify payload offset mapping and `.con` string verification. Added a C API integration test (`[capi][consolidation][commits][deletes]`) that executes the exact Consolidate -> Vacuum -> Consolidate sequence to guarantee the engine no longer crashes on missing `.del` files. --- TYPE: BUG DESC: Fix consolidator crash when processing vacuumed delete commits and allow vacuuming of .con files containing delete payloads.
abe9740 to
e6f09d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The previous implementation of the commits consolidator assumed that
.delfiles always exist on disk as standalone files. If a.delfile was consolidated and subsequently vacuumed, the consolidator threw a fatal non-retrievable error when trying to read its file size and payload during subsequent consolidation runs. Additionally, theArrayDirectorystring-based verification for vacuuming superseded.confiles failed when it encountered embedded binary.delpayloads, preventing those older.confiles from ever being vacuumed.This PR resolves these issues by making the engine fully aware of physical payload locations:
ArrayDirectorynow tracks the exact physical URI and byte offset for.delpayloads (whether raw on disk or embedded in older.confiles) insidedelete_and_update_tiles_location_.Added a
skip_delete_payloadhelper inArrayDirectoryto properly advance the stream past binary data during.confile string verification.Consolidator::write_consolidated_commits_filenow uses thelocation_mapto extract.delpayloads from their actual physical locations instead of only querying the VFS using the logical URI.Fixes the engine crash when consolidating an array with previously vacuumed delete commits.
Allows superseded
.confiles containing delete payloads to be successfully verified and vacuumed.The most significant changes are in
tiledb/sm/array/array_directory.cc(parsing and offset tracking) andtiledb/sm/consolidator/consolidator.cc(location-aware reading). Explicitly added a check to ensure.wrtfiles are not added to the payload location map, as they are zero-byte markers.Added two unit tests (
[array-directory][commits-mode-del]and[array-directory][vacuum-binary-skip]) to verify payload offset mapping and.constring verification. Added a C API integration test ([capi][consolidation][commits][deletes]) that executes the exact Consolidate -> Vacuum -> Consolidate sequence to guarantee the engine no longer crashes on missing.delfiles.TYPE: BUG
DESC: Fix consolidator crash when processing vacuumed delete commits and allow vacuuming of .con files containing delete payloads.