Skip to content

[python/c++] Connect C++ reader for dataframes#400

Merged
johnkerl merged 7 commits intomainfrom
kerl/temp2
Oct 24, 2022
Merged

[python/c++] Connect C++ reader for dataframes#400
johnkerl merged 7 commits intomainfrom
kerl/temp2

Conversation

@johnkerl
Copy link
Copy Markdown
Contributor

@johnkerl johnkerl commented Oct 12, 2022

Status

RuntimeError: Static type (UINT64) does not match expected type (INT64)

PR context

This is the third in a group of three related PRs:

@johnkerl johnkerl requested a review from gspowley October 12, 2022 22:56
@johnkerl johnkerl changed the base branch from main to bkmartinjr/386-soma-columns October 12, 2022 22:57
Base automatically changed from bkmartinjr/386-soma-columns to main October 12, 2022 23:19
@johnkerl johnkerl marked this pull request as ready for review October 13, 2022 15:41
@johnkerl johnkerl changed the title Connect libtiledbsoma to tiledbsoma readers for dataframes (#360 re-do) [WIP] Connect libtiledbsoma to tiledbsoma readers for dataframes Oct 13, 2022
@johnkerl
Copy link
Copy Markdown
Contributor Author

johnkerl commented Oct 13, 2022

@gspowley @bkmartinjr @Shelnutt2 everything seems working now, especially after some nice C++ mods which were included in #397 which was merged last evening.

The remaining fail is in Windows CI, whereat we do not build libtiledbsoma for Windows. And (AFAICT) the early-abort in MacOS CI is just because Windows CI failed (I think but am not certain).

Now that we have a hard dependency on libtiledbsoma, and since we've made the decision to entirely remove these lines which do work on Windows (rather than using them as fallback when import tiledbsoma.libtiledbsoma raises ModuleNotFoundError) -- it seems we now have the decision to formally drop support for Windows, including taking it out of the CI job.

Thoughts?

@gspowley
Copy link
Copy Markdown
Member

it seems we now have the decision to formally drop support for Windows, including taking it out of the CI job.

Thoughts?

I vote we delay supporting Windows. I believe it's more important to focus on developing SOMA features on Linux and MacOS now. We can revisit supporting Windows later.

@bkmartinjr
Copy link
Copy Markdown
Member

bkmartinjr commented Oct 13, 2022

CC @ambrosejcarr @aaronwolen

I will check with our team and comment if the short-term deprioritization of Windows is of concern.

Can I assume the proposal is to revisit and support by the time we release an "alpha" (ie, feature complete) version? We definitely have users on Windows, and we will want to enable them by the time we release data in this format.

@johnkerl johnkerl force-pushed the kerl/temp2 branch 17 times, most recently from 7aa4f6d to 7566501 Compare October 15, 2022 22:13
johnkerl added a commit that referenced this pull request Oct 21, 2022
johnkerl added a commit that referenced this pull request Oct 21, 2022
@johnkerl johnkerl force-pushed the kerl/temp2 branch 3 times, most recently from 27f6d43 to a832ee4 Compare October 21, 2022 19:48
@johnkerl johnkerl marked this pull request as ready for review October 21, 2022 19:53
johnkerl added a commit that referenced this pull request Oct 21, 2022
* temp double-dylib workaround

* title goes here

* fix ci

* code-review feedback

* Remove badly rebase bits of #400
@johnkerl johnkerl force-pushed the kerl/temp2 branch 2 times, most recently from 8c98b2d to ad14bb0 Compare October 21, 2022 20:34
attrs=attr_names,
)
if ids is not None:
sr.set_dim_points(SOMA_ROWID, util.ids_to_list(ids))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ids is an Arrow array, we should pass the Arrow array to set_dim_points instead of converting it to a list. This will reduce memory usage and improve performance by avoiding creating a copy.

for table in iterator:
yield table
if ids is not None:
sr.set_dim_points(A.schema.domain.dim(0).name, util.ids_to_list(ids))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Copy link
Copy Markdown
Member

@gspowley gspowley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's sync up on Monday.

"""
For the interface between ``SOMADataFrame::read`` et al. (Python) and ``SOMAReader`` (C++): the
``ids`` argument to the former can be slice or list; the argument to
``SOMAReader::set_dim_points`` must be a list.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When setting a slice for a SOMAReader query, we should use SOMAReader::set_dim_ranges instead of converting the slice to a list.

This test shows an example:
https://github.com/single-cell-data/TileDB-SOMA/blob/main/libtiledbsoma/test/test_soma_reader.py#L82

step = -1
stop = ids.stop + step
return pa.chunked_array(pa.array(list(range(ids.start, stop, step))))
if isinstance(ids, pa.Array):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of supporting Arrow arrays is captured in this test (currently in a PR):
https://github.com/single-cell-data/TileDB-SOMA/blob/gspowley/obs-slice-x-test/libtiledbsoma/test/test_soma_reader.py#L162

In this test, the "ids" will be type pa.ChunkedArray, so it will fall through the if isintance(...) checks and raise an exception.

``SOMAReader::set_dim_points`` must be a list.
"""
if isinstance(ids, list):
return pa.chunked_array(pa.array(ids))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to convert a list to an Arrow array, it can remain a list.

Copy link
Copy Markdown
Member

@gspowley gspowley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@johnkerl johnkerl merged commit 293e4cd into main Oct 24, 2022
@johnkerl johnkerl deleted the kerl/temp2 branch October 24, 2022 16:03
@johnkerl johnkerl changed the title Connect C++ reader for dataframes [python/c++] Connect C++ reader for dataframes Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants