-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arrow::compute::concat
should merge dictionary type when concatenating list of dictionaries
#6888
Comments
It's not really a bug |
Related #3558 |
While fixing this I saw this line: arrow-rs/arrow-select/src/dictionary.rs Line 130 in 123045c
which in the concat the so this line basically says if Having number of values more than number of keys is really unlikely when you merge dictionary that have the same value multiple times (which is the whole point of the dictionary) Can you please explain it to me what is the reason behind this check? |
See #3558, it can occur as a result of the way a single dictionary may be shared across multiple arrays, or as a result of previous processing. The intent is to optimise the dictionary when we know the performance cost will pay off |
|
Describe the bug
When concatenating lists of dictionary, the new dictionary type contains duplicate
To Reproduce
Expected behavior
when concatenating lists of dictionary, it should merge the dictionary rather than blindly concat them as well
Additional context
this happened to me when creating aggregate expression in DataFusion which calls
ScalarValue::iter_to_array(results);
which use concat underneathAccording to the spec duplicate values are valid:
The text was updated successfully, but these errors were encountered: