Conversation
|
@gspowley @bkmartinjr @Shelnutt2 everything seems working now, especially after some nice C++ mods which were included in #397 which was merged last evening. The remaining fail is in Windows CI, whereat we do not build Now that we have a hard dependency on Thoughts? |
I vote we delay supporting Windows. I believe it's more important to focus on developing SOMA features on Linux and MacOS now. We can revisit supporting Windows later. |
|
I will check with our team and comment if the short-term deprioritization of Windows is of concern. Can I assume the proposal is to revisit and support by the time we release an "alpha" (ie, feature complete) version? We definitely have users on Windows, and we will want to enable them by the time we release data in this format. |
7aa4f6d to
7566501
Compare
27f6d43 to
a832ee4
Compare
* temp double-dylib workaround * title goes here * fix ci * code-review feedback * Remove badly rebase bits of #400
8c98b2d to
ad14bb0
Compare
| attrs=attr_names, | ||
| ) | ||
| if ids is not None: | ||
| sr.set_dim_points(SOMA_ROWID, util.ids_to_list(ids)) |
There was a problem hiding this comment.
If ids is an Arrow array, we should pass the Arrow array to set_dim_points instead of converting it to a list. This will reduce memory usage and improve performance by avoiding creating a copy.
| for table in iterator: | ||
| yield table | ||
| if ids is not None: | ||
| sr.set_dim_points(A.schema.domain.dim(0).name, util.ids_to_list(ids)) |
apis/python/src/tiledbsoma/util.py
Outdated
| """ | ||
| For the interface between ``SOMADataFrame::read`` et al. (Python) and ``SOMAReader`` (C++): the | ||
| ``ids`` argument to the former can be slice or list; the argument to | ||
| ``SOMAReader::set_dim_points`` must be a list. |
There was a problem hiding this comment.
When setting a slice for a SOMAReader query, we should use SOMAReader::set_dim_ranges instead of converting the slice to a list.
This test shows an example:
https://github.com/single-cell-data/TileDB-SOMA/blob/main/libtiledbsoma/test/test_soma_reader.py#L82
apis/python/src/tiledbsoma/util.py
Outdated
| step = -1 | ||
| stop = ids.stop + step | ||
| return pa.chunked_array(pa.array(list(range(ids.start, stop, step)))) | ||
| if isinstance(ids, pa.Array): |
There was a problem hiding this comment.
The intention of supporting Arrow arrays is captured in this test (currently in a PR):
https://github.com/single-cell-data/TileDB-SOMA/blob/gspowley/obs-slice-x-test/libtiledbsoma/test/test_soma_reader.py#L162
In this test, the "ids" will be type pa.ChunkedArray, so it will fall through the if isintance(...) checks and raise an exception.
apis/python/src/tiledbsoma/util.py
Outdated
| ``SOMAReader::set_dim_points`` must be a list. | ||
| """ | ||
| if isinstance(ids, list): | ||
| return pa.chunked_array(pa.array(ids)) |
There was a problem hiding this comment.
We don't need to convert a list to an Arrow array, it can remain a list.
Status
SOMADataFrameandSOMAIndexedDataFrameare on this PRSOMASparseNdArrayandSOMADenseNdArraywill be on a separate PRsoma_*column handling in Python API #397. This PR is significantly smaller since there was some code overlap with Refinesoma_*column handling in Python API #397. However, alas, thetypeguarderrors persist:PR context
This is the third in a group of three related PRs:
readreturnpyarrow.Tablenotpyarrow.RecordBatch(as in an outdated version of that spec) -- now mergedmain-oldwhich will truly have ASCII columns, obviating the need for ourutil_arrow.ascii_to_unicode_pyarrow_readback-- now mergedreadmethods, which will go in cleanly nowpyarrow.Tableand with the first PR our unit tests will be ready to gopyarrow.LargeBinaryArray(needing decode) but when we are properly writing ASCII cells via the Python write path then the C++ code will read ASCII cells and return them as strings (no longer needing decoding)