Skip to content

Documentation Issue with train_test_split and blockwise #999

@christhorn2

Description

@christhorn2

Describe the issue:

API Documentation of dask train_test_split states that blockwise=False is supported for Arrays:
"For Dask Arrays, set blockwise=False to shuffle data between blocks as well."
https://ml.dask.org/modules/generated/dask_ml.model_selection.train_test_split.html#dask_ml.model_selection.train_test_split

This is the intention of the code too I think, and it delegates the job to ShuffleSplit:

elif all(isinstance(arr, da.Array) for arr in arrays):

However, ShuffleSplit does not support blockwise=False:

def _split(self, X):

Minimal Complete Verifiable Example:

from dask_ml.model_selection import train_test_split
import dask.array as da
x = da.arange(8, chunks=4)
train_test_split(x,blockwise=false)
....
NotImplementedError: ShuffleSplit with blockwise=False has not been implemented yet.

Environment:

  • Dask version: 2024.4.4
  • Python version: 3.9.18
  • Operating System:
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions