[Data] Migrate Daft dependency from getdaft to daft#64240
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the integration with Daft by migrating from the deprecated getdaft package to daft>=0.7.0, updating documentation URLs to daft.ai, and removing the restriction that prevented from_daft from working with PyArrow 14 and later. Additionally, tests are updated to handle dtype differences and tensor column comparisons. Feedback suggests adding a defensive version check in from_daft to ensure daft>=0.7.0 is installed, preventing potential issues for users with older versions of the package.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| Returns: | ||
| A :class:`~ray.data.Dataset` holding rows read from the DataFrame. | ||
| """ |
There was a problem hiding this comment.
Since the deprecated getdaft package (version <= 0.5.0) also installs the daft module, users who still have the old package installed might encounter cryptic errors when using from_daft with PyArrow >= 14. To prevent this and ensure a smooth migration, we should add a defensive version check to verify that the installed daft package is indeed >= 0.7.0.
| """ | |
| """ | |
| import daft | |
| from packaging.version import parse as parse_version | |
| if parse_version(daft.__version__) < parse_version("0.7.0"): | |
| raise ImportError( | |
| f"ray.data.from_daft requires daft >= 0.7.0, but found {daft.__version__}. " | |
| "Please upgrade daft via 'pip install -U daft'." | |
| ) |
Add a defensive version check to ensure daft>=0.7.0 is installed, preventing cryptic errors for users with residual getdaft<=0.5.0 (which also installs the daft module but lacks PyExtensionType guards). Addresses review feedback from gemini-code-assist on ray-project#64240.
|
Thanks for the suggestion! I've added a defensive version check in The check raises a clear |
Replace the deprecated `getdaft` package with the actively maintained `daft` package, and remove the now-obsolete PyArrow >= 14 version check in `from_daft()`. - requirements: getdaft==0.4.3 → daft>=0.7.0 - read_api.py: remove PyArrow version check and unused imports - test_daft.py: remove error test and skipif decorator, adapt for Daft 0.7.x dtype changes - docs: update getdaft.io URLs to daft.ai, remove PyArrow 14 warning Fixes ray-project#64178 Signed-off-by: jiangxt2 <jiangxt2@vip.qq.com> Co-Authored-By: Chang-Tong <zdcheerful@hotmail.com>
Add a defensive version check to ensure daft>=0.7.0 is installed, preventing cryptic errors for users with residual getdaft<=0.5.0 (which also installs the daft module but lacks PyExtensionType guards). Addresses review feedback from gemini-code-assist on ray-project#64240. Signed-off-by: jiangxt2 <jiangxt2@vip.qq.com> Co-Authored-By: Chang-Tong <zdcheerful@hotmail.com>
f71b40b to
c9cb063
Compare
dstrodtman
left a comment
There was a problem hiding this comment.
LGTM for docs, assuming Ray Data team signs off on dependency updates and CI changes.
Description
Migrate the Daft dependency in Ray's CI from the deprecated
getdaftpackageto the actively maintained
daftpackage, and remove the now-obsolete PyArrowThe
getdaftpackage was deprecated (last release 0.5.0, Jan 2026). The Daftproject moved to the
daftpackage on PyPI (latest: 0.7.15). The PyArrow >= 14check was needed because old Daft referenced
pa.PyExtensionTypedirectly,which was removed in PyArrow 14. Daft >= 0.7.0 fixed this with a
hasattrguard.
Related issues
Fixes #64178
Additional information
test_daft_round_tripusescheck_dtype=Falsebecause Ray returnsArrow-backed dtypes while Daft may return numpy dtypes after round-trip.
requirements_compiled*.txt) will be updated on next compile.daft>=0.7.0uses lower-bound constraint consistent with other ML datadependencies in Ray.