Skip to content

Permit true-ASCII attributes in non-from-pandas dataframes#1337

Merged
nguyenv merged 4 commits intodevfrom
kerl/ascii-in-not-from-pandas-dfs
Oct 5, 2022
Merged

Permit true-ASCII attributes in non-from-pandas dataframes#1337
nguyenv merged 4 commits intodevfrom
kerl/ascii-in-not-from-pandas-dfs

Conversation

@johnkerl
Copy link
Copy Markdown
Contributor

@johnkerl johnkerl commented Oct 3, 2022

Context:

#!/usr/bin/env python

import tiledbsoma as t
import pyarrow as pa
import os, shutil

uri = 'foo'
if os.path.exists(uri):
    shutil.rmtree(uri)

sdf = t.SOMADataFrame(uri)
sdf.create(pa.schema([("A", pa.int32()), ("B", pa.string())]))

giving me

Traceback (most recent call last):
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/./foo.py", line 12, in <module>
    sdf.create(pa.schema([("A", pa.int32()), ("B", pa.string())]))
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/soma_dataframe.py", line 53, in create
    self._create_empty(schema)
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/soma_dataframe.py", line 86, in _create_empty
    attrs = [
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/soma_dataframe.py", line 87, in <listcomp>
    tiledb.Attr(
  File "tiledb/libtiledb.pyx", line 1574, in tiledb.libtiledb.Attr.__init__
TypeError: dtype is not compatible with var-length attribute

However, with this mod in place, the array-create succeeds as intended.

Copy link
Copy Markdown

@nguyenv nguyenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you apply this diff? It modifies test_ascii_attribute to do better checking with var for dtype="ascii".

diff --git a/tiledb/tests/test_libtiledb.py b/tiledb/tests/test_libtiledb.py
index d322d77..98fa5a0 100644
--- a/tiledb/tests/test_libtiledb.py
+++ b/tiledb/tests/test_libtiledb.py
@@ -526,7 +526,15 @@ class AttributeTest(DiskTestCase):
         dom = tiledb.Domain(
             tiledb.Dim(name="d", domain=(1, 4), tile=1, dtype=np.uint32)
         )
-        attrs = [tiledb.Attr(name="A", dtype="ascii", var=True)]
+
+        with pytest.raises(TypeError) as exc_info:
+            tiledb.Attr(name="A", dtype="ascii", var=False)
+        assert (
+            str(exc_info.value) == "dtype is not compatible with var-length attribute"
+        )
+
+        attrs = [tiledb.Attr(name="A", dtype="ascii")]
+
         schema = tiledb.ArraySchema(domain=dom, attrs=attrs, sparse=sparse)
         tiledb.Array.create(path, schema)

@@ -547,6 +555,7 @@ class AttributeTest(DiskTestCase):
             assert A.schema.nattr == 1
             A.schema.dump()
             assert_captured(capfd, "Type: STRING_ASCII")
+            assert A.schema.attr("A").isvar
             assert A.schema.attr("A").dtype == np.bytes_
             assert A.schema.attr("A").isascii
             assert_array_equal(A[:]["A"], np.asarray(ascii_data, dtype=np.bytes_))

@nguyenv
Copy link
Copy Markdown

nguyenv commented Oct 4, 2022

Thank you so much for catching this. Just have some minor things to add.

@nguyenv
Copy link
Copy Markdown

nguyenv commented Oct 4, 2022

Oh, please also update this under Bug Fixes in HISTORY.md.

@johnkerl
Copy link
Copy Markdown
Contributor Author

johnkerl commented Oct 4, 2022

ok all ready for re-review @nguyenv -- thank you!! :)

@nguyenv nguyenv self-requested a review October 4, 2022 20:59
Copy link
Copy Markdown

@nguyenv nguyenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - thank you!

@johnkerl
Copy link
Copy Markdown
Contributor Author

johnkerl commented Oct 5, 2022

@nguyenv can you please merge this PR? I lack permissions to do so in this repo

@nguyenv
Copy link
Copy Markdown

nguyenv commented Oct 5, 2022

@nguyenv can you please merge this PR? I lack permissions to do so in this repo

Yes will do so tomorrow morning.

@nguyenv nguyenv merged commit 54cc935 into dev Oct 5, 2022
@nguyenv nguyenv deleted the kerl/ascii-in-not-from-pandas-dfs branch October 5, 2022 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants