Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition distribution across GPUs when using dask_cudf #18004

Open
preorat-sion opened this issue Feb 13, 2025 · 1 comment
Open

Partition distribution across GPUs when using dask_cudf #18004

preorat-sion opened this issue Feb 13, 2025 · 1 comment
Assignees

Comments

@preorat-sion
Copy link

When reading data from a CSV file, it is necessary to distribute the partitions across different GPUs. However, an error occurs when calling the to_device method.
df = dask_cudf.read_csv('./*.csv')
df = df.map_partitions(lambda part, i: part.to_device(i % 2))
KeyError: 'to_device'
During handling of the above exception, another exception occurred:
AttributeError: DataFrame object has no attribute to_device
Could you please help me understand what the error might be? Or is it necessary to use a different approach?

@TomAugspurger
Copy link
Contributor

With dask-cudf / dask.dataframe using the cudf backend, the in-memory dataframe objects are instances of cudf.DataFrame, which doesn't have a to_device method.

it is necessary to distribute the partitions across different GPUs

Could you share a bit more about why you want to do that explicitly? With the cudf backend, your tasks will automatically run on the device. Unless there's a strong reason not to, you might be better off letting dask choose which GPU to run particular tasks on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants