Skip to content

Fix consolidator crash on vacuumed delete commits. (#5771)#5772

Open
cl-earthscope wants to merge 6 commits intoTileDB-Inc:mainfrom
cl-earthscope:cl/fix-5771-commit-consolidator-crash
Open

Fix consolidator crash on vacuumed delete commits. (#5771)#5772
cl-earthscope wants to merge 6 commits intoTileDB-Inc:mainfrom
cl-earthscope:cl/fix-5771-commit-consolidator-crash

Conversation

@cl-earthscope
Copy link
Copy Markdown
Contributor

The previous implementation of the commits consolidator assumed that .del files always exist on disk as standalone files. If a .del file was consolidated and subsequently vacuumed, the consolidator threw a fatal non-retrievable error when trying to read its file size and payload during subsequent consolidation runs. Additionally, the ArrayDirectory string-based verification for vacuuming superseded .con files failed when it encountered embedded binary .del payloads, preventing those older .con files from ever being vacuumed.

This PR resolves these issues by making the engine fully aware of physical payload locations:

  • ArrayDirectory now tracks the exact physical URI and byte offset for .del payloads (whether raw on disk or embedded in older .con files) inside delete_and_update_tiles_location_.

  • Added a skip_delete_payload helper in ArrayDirectory to properly advance the stream past binary data during .con file string verification.

  • Consolidator::write_consolidated_commits_file now uses the location_map to extract .del payloads from their actual physical locations instead of only querying the VFS using the logical URI.

  • Fixes the engine crash when consolidating an array with previously vacuumed delete commits.

  • Allows superseded .con files containing delete payloads to be successfully verified and vacuumed.

The most significant changes are in tiledb/sm/array/array_directory.cc (parsing and offset tracking) and tiledb/sm/consolidator/consolidator.cc (location-aware reading). Explicitly added a check to ensure .wrt files are not added to the payload location map, as they are zero-byte markers.

Added two unit tests ([array-directory][commits-mode-del] and [array-directory][vacuum-binary-skip]) to verify payload offset mapping and .con string verification. Added a C API integration test ([capi][consolidation][commits][deletes]) that executes the exact Consolidate -> Vacuum -> Consolidate sequence to guarantee the engine no longer crashes on missing .del files.


TYPE: BUG
DESC: Fix consolidator crash when processing vacuumed delete commits and allow vacuuming of .con files containing delete payloads.

The previous implementation of the commits consolidator assumed that
`.del` files always exist on disk as standalone files. If a `.del` file
was consolidated and subsequently vacuumed, the consolidator threw a
fatal non-retrievable error when trying to read its file size and
payload during subsequent consolidation runs. Additionally, the
`ArrayDirectory` string-based verification for vacuuming superseded
`.con` files failed when it encountered embedded binary `.del` payloads,
preventing those older `.con` files from ever being vacuumed.

This PR resolves these issues by making the engine fully aware of
physical payload locations:

* `ArrayDirectory` now tracks the exact physical URI and byte offset for
`.del` payloads (whether raw on disk or embedded in older `.con` files)
inside `delete_and_update_tiles_location_`.
* Added a `skip_delete_payload` helper in `ArrayDirectory` to properly
advance the stream past binary data during `.con` file string
verification.
* `Consolidator::write_consolidated_commits_file` now uses the
`location_map` to extract `.del` payloads from their actual physical
locations instead of only querying the VFS using the logical URI.

* Fixes the engine crash when consolidating an array with previously
vacuumed delete commits.
* Allows superseded `.con` files containing delete payloads to be
successfully verified and vacuumed.

The most significant changes are in `tiledb/sm/array/array_directory.cc`
(parsing and offset tracking) and
`tiledb/sm/consolidator/consolidator.cc` (location-aware reading).
Explicitly added a check to ensure `.wrt` files are not added to the
payload location map, as they are zero-byte markers.

Added two unit tests (`[array-directory][commits-mode-del]` and
`[array-directory][vacuum-binary-skip]`) to verify payload offset
mapping and `.con` string verification. Added a C API integration test
(`[capi][consolidation][commits][deletes]`) that executes the exact
Consolidate -> Vacuum -> Consolidate sequence to guarantee the engine no
longer crashes on missing `.del` files.

---
TYPE: BUG
DESC: Fix consolidator crash when processing vacuumed delete commits and
allow vacuuming of .con files containing delete payloads.
@cl-earthscope cl-earthscope force-pushed the cl/fix-5771-commit-consolidator-crash branch from abe9740 to e6f09d6 Compare March 12, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants