-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Substrait List/EmptyList literals #10615
Conversation
Adds support for converting from DataFusion List/LargeList ScalarValues into Substrait List/EmptyList Literals and back
@@ -1138,7 +1139,7 @@ fn from_substrait_type(dt: &substrait::proto::Type) -> Result<DataType> { | |||
from_substrait_type(list.r#type.as_ref().ok_or_else(|| { | |||
substrait_datafusion_err!("List type must have inner type") | |||
})?)?; | |||
let field = Arc::new(Field::new("list_item", inner_type, true)); | |||
let field = Arc::new(Field::new_list_field(inner_type, true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a breaking change in the sense that the new field name is just "item" - to align with Arrow default
); | ||
match l.type_variation_reference { | ||
DEFAULT_CONTAINER_TYPE_REF => Ok(ScalarValue::List(Arc::new( | ||
GenericListArray::new_null(field.into(), 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the correct way for creating null lists, or is there something better? The list-of-lists structure ScalarValue::List uses is a bit confusing to me..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is correct.
@jonahgao here's a first part of the split - adding support for List types |
.iter() | ||
.map(|el| from_substrait_literal(el)) | ||
.collect::<Result<Vec<_>>>()?; | ||
let element_type = elements[0].data_type(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check if elements are empty and report an error? The literal input might come from systems other than DataFusion, and they might not be properly implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, done: cdc525c
} | ||
} | ||
Some(LiteralType::EmptyList(l)) => { | ||
let element_type = from_substrait_type(l.r#type.clone().unwrap().as_ref())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can remove unwrap
, it can become more robust. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, without the type specified we don't know what it should be - I guess we could default to NullType (which I think is what DataFusion does if you just do a "SELECT [] FROM ..", do you think that'd make sense?
I feel like Substrait probably intends this field to always exist, though I'm not sure, but e.g. in the Java library they have it as required: https://github.com/substrait-io/substrait-java/blob/79decd20e85d6a1a5623890042ebcf1415cf784a/core/src/main/java/io/substrait/expression/Expression.java#L451
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can return an error like "invalid parameter", but it may not be necessary to do so. Let's keep it as it is for now until someone requests this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you for your contribution @Blizzara .
* Add support for Substrait List/EmptyList literals Adds support for converting from DataFusion List/LargeList ScalarValues into Substrait List/EmptyList Literals and back * cleanup * fix test, add literal roundtrip tests for lists, and fix creating null large lists * add unit testing for type roundtrips * fix clippy * better error if a substrait literal list is empty
Which issue does this PR close?
Closes #.
Extracted part of #10531 - not necessary part for it but somewhat related
Rationale for this change
What changes are included in this PR?
Adds support for converting from DataFusion List/LargeList ScalarValues into Substrait List/EmptyList Literals and back
Are these changes tested?
Adds a round-trip unit test
Are there any user-facing changes?
More things are now supported, but I don't think Substrait support status is covered by documentation currently?