Permit true-ASCII attributes in non-from-pandas dataframes by johnkerl · Pull Request #1337 · TileDB-Inc/TileDB-Py

johnkerl · 2022-10-03T03:27:16Z

Context:

Forward-porting [python] Update ASCII storage for dataframes single-cell-data/TileDB-SOMA#273 from the main-old branch of tiledbsoma-py to the main branch.
On main-old all the dataframes are written using from_pandas: e.g. https://github.com/single-cell-data/TileDB-SOMA/blob/0.1.12/apis/python/src/tiledbsoma/annotation_dataframe.py#L411-L425
On main dataframes can be created that way, or from Arrow as at https://github.com/single-cell-data/TileDB-SOMA/blob/53af1a7a26dd241013102e9ac5b7db9e6a128393/apis/python/src/tiledbsoma/soma_dataframe.py#L92-L108
When I use dtype="ascii" as on [python] Update ASCII storage for dataframes single-cell-data/TileDB-SOMA#273, but in this non-from-pandas context (see [python] Use true ASCII attributes in dataframes single-cell-data/TileDB-SOMA#359), I get

#!/usr/bin/env python

import tiledbsoma as t
import pyarrow as pa
import os, shutil

uri = 'foo'
if os.path.exists(uri):
    shutil.rmtree(uri)

sdf = t.SOMADataFrame(uri)
sdf.create(pa.schema([("A", pa.int32()), ("B", pa.string())]))

giving me

Traceback (most recent call last):
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/./foo.py", line 12, in <module>
    sdf.create(pa.schema([("A", pa.int32()), ("B", pa.string())]))
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/soma_dataframe.py", line 53, in create
    self._create_empty(schema)
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/soma_dataframe.py", line 86, in _create_empty
    attrs = [
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/soma_dataframe.py", line 87, in <listcomp>
    tiledb.Attr(
  File "tiledb/libtiledb.pyx", line 1574, in tiledb.libtiledb.Attr.__init__
TypeError: dtype is not compatible with var-length attribute

However, with this mod in place, the array-create succeeds as intended.

nguyenv

Can you apply this diff? It modifies test_ascii_attribute to do better checking with var for dtype="ascii".

diff --git a/tiledb/tests/test_libtiledb.py b/tiledb/tests/test_libtiledb.py
index d322d77..98fa5a0 100644
--- a/tiledb/tests/test_libtiledb.py
+++ b/tiledb/tests/test_libtiledb.py
@@ -526,7 +526,15 @@ class AttributeTest(DiskTestCase):
         dom = tiledb.Domain(
             tiledb.Dim(name="d", domain=(1, 4), tile=1, dtype=np.uint32)
         )
-        attrs = [tiledb.Attr(name="A", dtype="ascii", var=True)]
+
+        with pytest.raises(TypeError) as exc_info:
+            tiledb.Attr(name="A", dtype="ascii", var=False)
+        assert (
+            str(exc_info.value) == "dtype is not compatible with var-length attribute"
+        )
+
+        attrs = [tiledb.Attr(name="A", dtype="ascii")]
+
         schema = tiledb.ArraySchema(domain=dom, attrs=attrs, sparse=sparse)
         tiledb.Array.create(path, schema)

@@ -547,6 +555,7 @@ class AttributeTest(DiskTestCase):
             assert A.schema.nattr == 1
             A.schema.dump()
             assert_captured(capfd, "Type: STRING_ASCII")
+            assert A.schema.attr("A").isvar
             assert A.schema.attr("A").dtype == np.bytes_
             assert A.schema.attr("A").isascii
             assert_array_equal(A[:]["A"], np.asarray(ascii_data, dtype=np.bytes_))

tiledb/libtiledb.pyx

nguyenv · 2022-10-04T17:39:27Z

Thank you so much for catching this. Just have some minor things to add.

nguyenv · 2022-10-04T17:39:54Z

Oh, please also update this under Bug Fixes in HISTORY.md.

johnkerl · 2022-10-04T19:51:51Z

ok all ready for re-review @nguyenv -- thank you!! :)

nguyenv

Awesome - thank you!

johnkerl · 2022-10-05T02:25:58Z

@nguyenv can you please merge this PR? I lack permissions to do so in this repo

nguyenv · 2022-10-05T02:26:52Z

@nguyenv can you please merge this PR? I lack permissions to do so in this repo

Yes will do so tomorrow morning.

Permit true-ASCII attributes in non-from-pandas dataframes

aec302a

johnkerl requested a review from nguyenv October 3, 2022 03:27

johnkerl mentioned this pull request Oct 3, 2022

[python] Use true ASCII attributes in dataframes single-cell-data/TileDB-SOMA#359

Merged

nguyenv reviewed Oct 4, 2022

View reviewed changes

tiledb/libtiledb.pyx Outdated Show resolved Hide resolved

johnkerl added 3 commits October 4, 2022 15:48

code-review feedback

efb5008

code-review feedback

4464b82

code-review feedback

222f4e9

nguyenv self-requested a review October 4, 2022 20:59

nguyenv approved these changes Oct 4, 2022

View reviewed changes

nguyenv merged commit 54cc935 into dev Oct 5, 2022

nguyenv deleted the kerl/ascii-in-not-from-pandas-dfs branch October 5, 2022 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permit true-ASCII attributes in non-from-pandas dataframes#1337

Permit true-ASCII attributes in non-from-pandas dataframes#1337
nguyenv merged 4 commits intodevfrom
kerl/ascii-in-not-from-pandas-dfs

johnkerl commented Oct 3, 2022 •

edited

Loading

Uh oh!

nguyenv left a comment •

edited

Loading

Uh oh!

Uh oh!

nguyenv commented Oct 4, 2022

Uh oh!

nguyenv commented Oct 4, 2022

Uh oh!

johnkerl commented Oct 4, 2022

Uh oh!

nguyenv left a comment

Uh oh!

johnkerl commented Oct 5, 2022

Uh oh!

nguyenv commented Oct 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johnkerl commented Oct 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nguyenv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nguyenv commented Oct 4, 2022

Uh oh!

nguyenv commented Oct 4, 2022

Uh oh!

johnkerl commented Oct 4, 2022

Uh oh!

nguyenv left a comment

Choose a reason for hiding this comment

Uh oh!

johnkerl commented Oct 5, 2022

Uh oh!

nguyenv commented Oct 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johnkerl commented Oct 3, 2022 •

edited

Loading

nguyenv left a comment •

edited

Loading