-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All array functions should represent NULL
as an element
#7142
Comments
To be clear, to be consistent with the rest of the datafusion type system, I think we should stop using DataType::Null to represent null constants in array functions and instead move to using So after type coercion I would expect
|
@alamb, it looks reasonable for me. But what we would do with pure null arrays (see the ticket: #7144). Should it be |
Good question @izveigor -- I left a response here #7144 (comment) |
It seems only these two functions have issues with it?
|
Apparently |
What is the expected behavior for array concat with nulls and empty arrays?
|
@alamb Should we introduce another Expr::ArrayFunction that differs from other Expr::ScalarFunctions? |
I don't think null handling has to be unique to array functions, and it is a complicated enough topic that adding a new abstraction I think will simply make the code even more complicated. In this case, I think the intention is to treat a NULL constant as a one element list with a single NULL I think:
So for the example given by @izveigor above:
I would expect this to be parsed like
Then I would expect type coercion to coerce the
And then the function implementation only handles The type coercion for function arguments is here: https://github.com/apache/arrow-datafusion/blob/2185842be22b695cf00e615db68b373f86fd162b/datafusion/expr/src/type_coercion/functions.rs#L33-L72 Does that make sense @jayzhan211 ? |
@alamb Do you mean we should do null conversion in In
|
I haven't had a chance to review #7580 in depth However, I agree one core problem is there is no current signature that can express something like "the first agument is a ListArray and the subsequent arguments must be the same element type of that list" Perhaps we can either:
Something like trait SignatureComputation {
/// Returns a Vec of all possible valid argument types for this signature.
fn get_valid_types(
&self,
signature: &TypeSignature,
current_types: &[DataType],
) -> Result<Vec<Vec<DataType>>>; And then enum TypeSignature {
...
Computation(Arc<dyn SignatureComputation),
...
} Which would allow arbitrarily complex type signatures |
Take a note that array literal |
@alamb Do we have to deal with nested list like duckdb? If we naively convert null to i32. We got If we just find the highest dimension one and convert null to it. We got I found current approach is hard to deal with if we consider dimension. |
I think it would help in the situation above to keep the types and values separate and I think the logic should be able to handle it. So I think from the parser
So the code needs to determine the output type ONLY based on the input types:
Which should be And then the coercion code needs to try and cast each value to |
Enhance nulls and empty array handling for each array functions
|
When working on task #9108, I found handling |
Is your feature request related to a problem or challenge?
Follow on to #6662
We have important problem related to using array functions with
NULL
statements. As @alamb said in PR (see below for full details) DataFusion casting system should be adapted for it, otherwise it will always throw errors.The current implementation:
Should be:
Describe the solution you'd like
@alamb statement about the issue:
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: