-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on dataframe count using arrow dataset #800
Comments
I cannot reproduce this in the |
I tracked down the error. It occurs when attempting to convert the datafusion-python/src/dataset_exec.rs Lines 61 to 65 in ae7470e
However, the |
Describe the bug
When using a
pyarrow.dataset
as your source and performing a dataframecount
operation you get an error.To Reproduce
You can point the below snippet to any parquet file.
This generates the following output. The
show
is to demonstrate the file is read appropriately.Expected behavior
count()
should return the number of rows in this dataset.Work around is to aggregate and count
Additional context
In my investigation, I found that we register arrow datasets by creating a
TableProvider
insrc/dataset.rs
and then the execution calls happen insrc/dataset_exec.rs
.The text was updated successfully, but these errors were encountered: