Arrow Expressions and Vortex DataSets #5801
Unanswered
paultiq
asked this question in
Issue Triage
Replies: 2 comments 1 reply
-
|
We need to support more substrait conversions in vortex. it would be better to use the vortex duckdb extension: con = duckdb.connect()
con.execute("INSTALL vortex; LOAD vortex;")
arr = pa.array([datetime(2024, 1, 1), datetime(2024, 6, 15), datetime(2024, 12, 31)])
table = pa.table({"ts": arr})
file_path = tmp_path / "test_timestamp.vortex"
vx.io.write(table, str(file_path))
# Use the vortex extension's vortex_scan function
result = con.execute(f"SELECT * FROM vortex_scan('{file_path}') WHERE ts > '2024-06-01'").fetchall()
assert len(result) == 2, f"Expected 2 rows, got {len(result)}: {result}"
# The results should be 2024-06-15 and 2024-12-31
timestamps = sorted([r[0] for r in result])
assert timestamps[0] == datetime(2024, 6, 15)
assert timestamps[1] == datetime(2024, 12, 31) |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
This the same problem we have elsewhere internally. I think @danking was working on fixing it but not sure what he settled on there. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Issue Description
Vortex's compute differs significantly from Arrow compute in terms of supported predicates. #5781 provides one list where Vortex doesn't support things that Arrow supports... and #5765 is an example where Vortex's compute supports things Arrow doesn't.
Assuming that 5781 will take some time to resolve (to achieve parity with Arrow), this leads to a number of problems... especially with duckdb which will attempt to pushdown unsupported expressions. For example, the following query will raise an exception in duckdb:
* duckdb's CanPushdown assumes all Arrow sources support the same expressions.
I'm unclear on whether this is an Arrow problem, a Vortex problem or a duckdb problem:
One thing Vortex can do is add test cases for:
DuckDB Example
Arrow Example
All this does is demonstrate that certain kernels aren't implemented. The point here is not that the kernel isn't implement, but just to demonstrate why the above (duckdb) example occurs.
Parquet Works
Vortex Fails
Expected Behavior
.
Actual Behavior
.
Reproduction Steps
.
OS Version Information
Ubuntu 24.04
I acknowledge that:
```) on separate lines.Beta Was this translation helpful? Give feedback.
All reactions