Skip to content

Segfault when querying dense array with variable-length attributes when indices access tiles out of order #2305

@chanjd

Description

@chanjd

Querying a dense TileDB array with variable-length string attributes causes a segfault (SIGSEGV, exit 139) when the index list causes tiles to be accessed out of order.

Confirmed affected: TileDB-Py 0.35.1, 0.36.0, 0.36.1

Workaround: sort indices before querying — np.sort(indices).


Conditions required to trigger

All three must be met:

  1. Dense array
  2. Variable-length attributes
  3. At least one index falls in a lower tile than a preceding index (tile access order not monotonically non-decreasing)

Note: descending indices within a single tile are safe. The tile boundary crossing is the trigger, not global sort order.


Verified cases (tile extent = 1000)

Unsafe:

  • [1001, 999] — tile 1 → tile 0
  • [1500, 500] — tile 1 → tile 0
  • [2500, 500] — tile 2 → tile 0
  • [500, 1500, 300] — tile 0 → tile 1 → tile 0

Safe:

  • [999, 1001] — tile 0 → tile 1 (ascending cross-tile)
  • [500, 1500] — tile 0 → tile 1 (ascending cross-tile)
  • [1500, 1200] — tile 1 → tile 1 (descending, same tile)
  • [500, 200] — tile 0 → tile 0 (descending, same tile)
  • [999, 1001, 1000] — tile 0 → tile 1 → tile 1 (lower tile always first)

MRE — multi_index

import tiledb
import numpy as np
import tempfile

temp_dir = tempfile.mkdtemp()
uri = f'{temp_dir}/dense_varlen'

# tile=1000: tile 0 covers indices 0-999, tile 1 covers 1000-1999
dim = tiledb.Dim(name='idx', domain=(0, 99999), tile=1000, dtype=np.uint32)
attr = tiledb.Attr(name='value', dtype=str)
schema = tiledb.ArraySchema(domain=tiledb.Domain(dim), sparse=False, attrs=[attr])
tiledb.Array.create(uri, schema)

with tiledb.open(uri, 'w') as arr:
    arr[0:10000] = {'value': [f'val_{i}' for i in range(10000)]}

with tiledb.open(uri, 'r') as arr:
    result = arr.multi_index[np.array([1001, 999], dtype=np.uint32)]  # SEGFAULT: tile 1 before tile 0

MRE — .df[]

import tiledb
import numpy as np
import tempfile

temp_dir = tempfile.mkdtemp()
uri = f'{temp_dir}/dense_varlen'

# tile=1000: tile 0 covers indices 0-999, tile 1 covers 1000-1999
dim = tiledb.Dim(name='idx', domain=(0, 99999), tile=1000, dtype=np.uint32)
attr = tiledb.Attr(name='value', dtype=str)
schema = tiledb.ArraySchema(domain=tiledb.Domain(dim), sparse=False, attrs=[attr])
tiledb.Array.create(uri, schema)

with tiledb.open(uri, 'w') as arr:
    arr[0:10000] = {'value': [f'val_{i}' for i in range(10000)]}

with tiledb.open(uri, 'r') as arr:
    result = arr.df[[1001, 999]]  # SEGFAULT: tile 1 before tile 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions