Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return null for date_trunc(null) instead of panic #6723

Merged
merged 7 commits into from
Jun 21, 2023
Merged

Return null for date_trunc(null) instead of panic #6723

merged 7 commits into from
Jun 21, 2023

Conversation

BryanEmond
Copy link
Contributor

@BryanEmond BryanEmond commented Jun 19, 2023

Which issue does this PR close?

Closes #6701.

Rationale for this change

see #6701

What changes are included in this PR?

Returns None if scalar input to date_trunc is null value.

Are these changes tested?

yes

Are there any user-facing changes?

Yes. Previously date_trunc(null) panics, now after this patch it returns null.

@github-actions github-actions bot added core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Jun 19, 2023
Comment on lines 840 to 843
query P
SELECT DATE_TRUNC('year', NULL);
----
1970-01-01T00:00:00
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think @alamb added the expected behavior in #6701, which is null should be returned instead of this default value.

@@ -287,31 +287,31 @@ pub fn date_trunc(args: &[ColumnarValue]) -> Result<ColumnarValue> {
"minute" => {
// trunc to minute
let second = ScalarValue::TimestampNanosecond(
Some(nano.unwrap() / 1_000_000_000 * 1_000_000_000),
Some(nano.unwrap_or_default() / 1_000_000_000 * 1_000_000_000),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently in date_trunc, nano is assumed to be Some. To return null for null input instead of unwrap on it, we can simply return a null timestamp scalar if nano is None.

@viirya viirya changed the title Panic date trunc Return null for date_trunc(null) instead of panic Jun 19, 2023
@BryanEmond
Copy link
Contributor Author

BryanEmond commented Jun 20, 2023

I know this push doesn't work, but I really don't get it. Now the unwrap() error is gone, but I get this error.

Running "timestamps.slt"
External error: query failed: DataFusion error: simplify_expressions
caused by
Internal error: Optimizer rule 'simplify_expressions' failed, due to generate a different schema,
original schema:
DFSchema { fields: [DFField { qualifier: None, field: Field { name: "date_trunc(Utf8(\"year\"),NULL)", data_type: Timestamp(Nanosecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} },
new schema: 
DFSchema { fields: [DFField { qualifier: None, field: Field { name: "date_trunc(Utf8(\"year\"),NULL)", data_type: Null, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }. 
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
[SQL] SELECT DATE_TRUNC('year', NULL);
at tests/sqllogictests/test_files/timestamps.slt:840

I've looked into ScalarValue to find a null timestamp, but it doesn't exist. I'm not sure if Null should work in this case or if TimestampNull needs to be implemented?

Thank you for the help, but I will let some one else complete it. I don't think I have the skills to complete this issue.

@@ -283,6 +283,11 @@ pub fn date_trunc(args: &[ColumnarValue]) -> Result<ColumnarValue> {
Ok(match array {
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(v, tz_opt)) => {
let nano = (f)(*v)?;

if nano.is_none() {
return Ok(ColumnarValue::Scalar(ScalarValue::Null));
Copy link
Member

@viirya viirya Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is because you use ScalarValue::Null here.

You can take a look of enum ScalarValue. ScalarValue::Null is for DataType::Null. That's why there is an error for a different schema.

For other types, they can take a Some(value) or a None which represents a null value for the type.

For TimestampNanosecond type, you just need to put ScalarValue::TimestampNanosecond with None here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how TimestampNanosecond is defined in enum ScalarValue:

/// Timestamp Nanoseconds
TimestampNanosecond(Option<i64>, Option<Arc<str>>),

The first argument is Option<i64>. If it is a None, meaning that is a null TimestampNanosecond scalar value.

@viirya
Copy link
Member

viirya commented Jun 20, 2023

You are pretty close to finish this.

@viirya
Copy link
Member

viirya commented Jun 20, 2023

Just a format issue:

             if nano.is_none() {
-                return Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(None, tz_opt.clone())));
+                return Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(
+                    None,
+                    tz_opt.clone(),
+                )));
             }

@viirya
Copy link
Member

viirya commented Jun 20, 2023

You can run cargo fmt locally or use it in your editor plugin so the code will be formatted automatically.

@BryanEmond
Copy link
Contributor Author

Thank you for your help, I really appreciate it

@viirya viirya merged commit c42f5ef into apache:main Jun 21, 2023
@viirya
Copy link
Member

viirya commented Jun 21, 2023

Thank you @BryanEmond

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

date_trunc(null) results in a panic
2 participants