Skip to content

[Data] Migrate Daft dependency from getdaft to daft#64240

Open
jiangxt2 wants to merge 5 commits into
ray-project:masterfrom
jiangxt2:feat/daft-migration
Open

[Data] Migrate Daft dependency from getdaft to daft#64240
jiangxt2 wants to merge 5 commits into
ray-project:masterfrom
jiangxt2:feat/daft-migration

Conversation

@jiangxt2

Copy link
Copy Markdown

Description

Migrate the Daft dependency in Ray's CI from the deprecated getdaft package
to the actively maintained daft package, and remove the now-obsolete PyArrow

= 14 version check in from_daft().

The getdaft package was deprecated (last release 0.5.0, Jan 2026). The Daft
project moved to the daft package on PyPI (latest: 0.7.15). The PyArrow >= 14
check was needed because old Daft referenced pa.PyExtensionType directly,
which was removed in PyArrow 14. Daft >= 0.7.0 fixed this with a hasattr
guard.

Related issues

Fixes #64178

Additional information

  • test_daft_round_trip uses check_dtype=False because Ray returns
    Arrow-backed dtypes while Daft may return numpy dtypes after round-trip.
  • CI lock files (requirements_compiled*.txt) will be updated on next compile.
  • daft>=0.7.0 uses lower-bound constraint consistent with other ML data
    dependencies in Ray.

@jiangxt2 jiangxt2 requested review from a team as code owners June 21, 2026 06:32

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the integration with Daft by migrating from the deprecated getdaft package to daft>=0.7.0, updating documentation URLs to daft.ai, and removing the restriction that prevented from_daft from working with PyArrow 14 and later. Additionally, tests are updated to handle dtype differences and tensor column comparisons. Feedback suggests adding a defensive version check in from_daft to ensure daft>=0.7.0 is installed, preventing potential issues for users with older versions of the package.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


Returns:
A :class:`~ray.data.Dataset` holding rows read from the DataFrame.
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since the deprecated getdaft package (version <= 0.5.0) also installs the daft module, users who still have the old package installed might encounter cryptic errors when using from_daft with PyArrow >= 14. To prevent this and ensure a smooth migration, we should add a defensive version check to verify that the installed daft package is indeed >= 0.7.0.

Suggested change
"""
"""
import daft
from packaging.version import parse as parse_version
if parse_version(daft.__version__) < parse_version("0.7.0"):
raise ImportError(
f"ray.data.from_daft requires daft >= 0.7.0, but found {daft.__version__}. "
"Please upgrade daft via 'pip install -U daft'."
)

@ray-gardener ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels Jun 21, 2026
jiangxt2 pushed a commit to jiangxt2/ray that referenced this pull request Jun 22, 2026
Add a defensive version check to ensure daft>=0.7.0 is installed,
preventing cryptic errors for users with residual getdaft<=0.5.0
(which also installs the daft module but lacks PyExtensionType guards).

Addresses review feedback from gemini-code-assist on ray-project#64240.
@jiangxt2

Copy link
Copy Markdown
Author

Thanks for the suggestion! I've added a defensive version check in from_daft() to ensure daft >= 0.7.0 is installed. This prevents cryptic AttributeError for users with residual getdaft<=0.5.0 who haven't upgraded to the new package.

The check raises a clear ImportError with upgrade instructions if an older version is detected.

jiangxt2 and others added 2 commits June 22, 2026 16:18
Replace the deprecated `getdaft` package with the actively maintained
`daft` package, and remove the now-obsolete PyArrow >= 14 version check
in `from_daft()`.

- requirements: getdaft==0.4.3 → daft>=0.7.0
- read_api.py: remove PyArrow version check and unused imports
- test_daft.py: remove error test and skipif decorator, adapt for
  Daft 0.7.x dtype changes
- docs: update getdaft.io URLs to daft.ai, remove PyArrow 14 warning

Fixes ray-project#64178

Signed-off-by: jiangxt2 <jiangxt2@vip.qq.com>
Co-Authored-By: Chang-Tong <zdcheerful@hotmail.com>
Add a defensive version check to ensure daft>=0.7.0 is installed,
preventing cryptic errors for users with residual getdaft<=0.5.0
(which also installs the daft module but lacks PyExtensionType guards).

Addresses review feedback from gemini-code-assist on ray-project#64240.

Signed-off-by: jiangxt2 <jiangxt2@vip.qq.com>
Co-Authored-By: Chang-Tong <zdcheerful@hotmail.com>
@jiangxt2 jiangxt2 force-pushed the feat/daft-migration branch from f71b40b to c9cb063 Compare June 22, 2026 09:29

@dstrodtman dstrodtman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs, assuming Ray Data team signs off on dependency updates and CI changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Migrate Daft dependency from deprecated getdaft to daft package

2 participants