-
Notifications
You must be signed in to change notification settings - Fork 39
Segfault when querying dense array with variable-length attributes when indices access tiles out of order #2305
Copy link
Copy link
Open
Description
Querying a dense TileDB array with variable-length string attributes causes a segfault (SIGSEGV, exit 139) when the index list causes tiles to be accessed out of order.
Confirmed affected: TileDB-Py 0.35.1, 0.36.0, 0.36.1
Workaround: sort indices before querying — np.sort(indices).
Conditions required to trigger
All three must be met:
- Dense array
- Variable-length attributes
- At least one index falls in a lower tile than a preceding index (tile access order not monotonically non-decreasing)
Note: descending indices within a single tile are safe. The tile boundary crossing is the trigger, not global sort order.
Verified cases (tile extent = 1000)
Unsafe:
[1001, 999]— tile 1 → tile 0[1500, 500]— tile 1 → tile 0[2500, 500]— tile 2 → tile 0[500, 1500, 300]— tile 0 → tile 1 → tile 0
Safe:
[999, 1001]— tile 0 → tile 1 (ascending cross-tile)[500, 1500]— tile 0 → tile 1 (ascending cross-tile)[1500, 1200]— tile 1 → tile 1 (descending, same tile)[500, 200]— tile 0 → tile 0 (descending, same tile)[999, 1001, 1000]— tile 0 → tile 1 → tile 1 (lower tile always first)
MRE — multi_index
import tiledb
import numpy as np
import tempfile
temp_dir = tempfile.mkdtemp()
uri = f'{temp_dir}/dense_varlen'
# tile=1000: tile 0 covers indices 0-999, tile 1 covers 1000-1999
dim = tiledb.Dim(name='idx', domain=(0, 99999), tile=1000, dtype=np.uint32)
attr = tiledb.Attr(name='value', dtype=str)
schema = tiledb.ArraySchema(domain=tiledb.Domain(dim), sparse=False, attrs=[attr])
tiledb.Array.create(uri, schema)
with tiledb.open(uri, 'w') as arr:
arr[0:10000] = {'value': [f'val_{i}' for i in range(10000)]}
with tiledb.open(uri, 'r') as arr:
result = arr.multi_index[np.array([1001, 999], dtype=np.uint32)] # SEGFAULT: tile 1 before tile 0MRE — .df[]
import tiledb
import numpy as np
import tempfile
temp_dir = tempfile.mkdtemp()
uri = f'{temp_dir}/dense_varlen'
# tile=1000: tile 0 covers indices 0-999, tile 1 covers 1000-1999
dim = tiledb.Dim(name='idx', domain=(0, 99999), tile=1000, dtype=np.uint32)
attr = tiledb.Attr(name='value', dtype=str)
schema = tiledb.ArraySchema(domain=tiledb.Domain(dim), sparse=False, attrs=[attr])
tiledb.Array.create(uri, schema)
with tiledb.open(uri, 'w') as arr:
arr[0:10000] = {'value': [f'val_{i}' for i in range(10000)]}
with tiledb.open(uri, 'r') as arr:
result = arr.df[[1001, 999]] # SEGFAULT: tile 1 before tile 0Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels