Partition distribution across GPUs when using dask_cudf #18004

preorat-sion · 2025-02-13T18:02:28Z

When reading data from a CSV file, it is necessary to distribute the partitions across different GPUs. However, an error occurs when calling the to_device method.
df = dask_cudf.read_csv('./*.csv')
df = df.map_partitions(lambda part, i: part.to_device(i % 2))
KeyError: 'to_device'
During handling of the above exception, another exception occurred:
AttributeError: DataFrame object has no attribute to_device
Could you please help me understand what the error might be? Or is it necessary to use a different approach?

TomAugspurger · 2025-02-18T16:03:51Z

With dask-cudf / dask.dataframe using the cudf backend, the in-memory dataframe objects are instances of cudf.DataFrame, which doesn't have a to_device method.

it is necessary to distribute the partitions across different GPUs

Could you share a bit more about why you want to do that explicitly? With the cudf backend, your tasks will automatically run on the device. Unless there's a strong reason not to, you might be better off letting dask choose which GPU to run particular tasks on.

Matt711 assigned TomAugspurger Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition distribution across GPUs when using dask_cudf #18004

Partition distribution across GPUs when using dask_cudf #18004

preorat-sion commented Feb 13, 2025

TomAugspurger commented Feb 18, 2025

Partition distribution across GPUs when using dask_cudf #18004

Partition distribution across GPUs when using dask_cudf #18004

Comments

preorat-sion commented Feb 13, 2025

TomAugspurger commented Feb 18, 2025