You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected behavior
I would expect a parquet file to be scanned given it can be read by polars w/out RAPIDS and cudf.pandas. Or maybe more descriptive error message.
Environment overview (please complete the following information)
I ran this locally on an RTX Ada 6000. Details are in the gist.
The text was updated successfully, but these errors were encountered:
davidwendt
changed the title
[BUG]
[BUG] polars[gpu] is unable to scan_parquet() a file that is able to be read by cudf.pandas and polars[cpu].
Feb 21, 2025
In cudf-polars we check that the file we read produces a schema that is the same as what polars thinks we have. Pylibcudf reads the datetime columns correctly as us resolution, so we fail this check.
If I rewrite the table with pyarrow:
import pyarrow.parquet as pq
table = pq.read_table("fhvhv_tripdata_2024-01.parquet")
pq.write_table(table, "rewritten.parquet")
https://gist.github.com/will-hill/24200d675b10537027bda96772dce277
Bug Description
polars[gpu] is unable to scan_parquet() a file that is able to be read by cudf.pandas and polars[cpu].
Steps/Code to reproduce bug
Here is a gist with Colab link to reproduce the error:
https://gist.github.com/will-hill/24200d675b10537027bda96772dce277
Expected behavior
I would expect a parquet file to be scanned given it can be read by polars w/out RAPIDS and cudf.pandas. Or maybe more descriptive error message.
Environment overview (please complete the following information)
I ran this locally on an RTX Ada 6000. Details are in the gist.
The text was updated successfully, but these errors were encountered: