Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add interval arithmetic for timestamp types #7758

Merged
merged 8 commits into from
Oct 9, 2023
Merged

Conversation

mhilton
Copy link
Contributor

@mhilton mhilton commented Oct 6, 2023

Which issue does this PR close?

Closes #7756.

Rationale for this change

It will be useful for the interval calculating feature to be able to successfully determine the intervals for timestamp columns.

What changes are included in this PR?

The new_zero method for a ScalarValue with a Duration type now returns a 0 value, rather than a NULL.

Are these changes tested?

Yes

Are there any user-facing changes?

The new_zero function will now return a 0 Duration rather than NULL.

Timestamp types have custom arithmetic and need special handling
when attempting to determine potential interval ranges. Change the
processing of comparison operator propagation to convert timestamp
intervals into int64 intervals for processing. The results are
converted back the the correct datatype at the end of the process.
@github-actions github-actions bot added the physical-expr Physical Expressions label Oct 6, 2023
right_child: &Interval,
left_type: &DataType,
right_type: &DataType,
) -> Result<(Option<Interval>, Option<Interval>)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function have some additional constraint checks on the two timestamp types? Perhaps to ensure the two timestamps are actually comparable? I couldn't think of any to add, but maybe someone else can.

@berkaysynnada
Copy link
Contributor

I've observed the problem is in new_zero() function while defining the target node for timestamps.

https://github.com/apache/arrow-datafusion/blob/3d1b23a04bdc04c526e2dcb06e0cf1995707587d/datafusion/common/src/scalar.rs#L836-L845

x>y is reformed as x-y>0, and (x-y) node is assigned with a new interval of [0, inf). However, new_zero() of ScalarValue::Duration's are set as None, I don't know why these are left in that way but I believe they need to be set as Some(0), similar to other types.

            DataType::Duration(TimeUnit::Second) => ScalarValue::DurationSecond(None),
            DataType::Duration(TimeUnit::Millisecond) => {
                ScalarValue::DurationMillisecond(Some(0))
            }
            DataType::Duration(TimeUnit::Microsecond) => {
                ScalarValue::DurationMicrosecond(Some(0))
            }
            DataType::Duration(TimeUnit::Nanosecond) => {
                ScalarValue::DurationNanosecond(Some(0))
            }

I think no additional implementation is needed to handle timestamp cases. When I applied these changes, the test mentioned in the issue passed successfully.

The review from @berkaysynnada showed that it was not necessary to
have special handling for timestamp types, but to make sure the
new_zero function for scalars of a duration type return 0 rather
than null values. Apply the suggested change, leaving the test to
ensure the functionality doesn't break in the future.
@mhilton
Copy link
Contributor Author

mhilton commented Oct 9, 2023

Thanks for the suggestion @berkaysynnada I have reverted my changes and made your suggested change, keeping the test, and it works great.

Apply a number of changes suggested by clippy.
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate labels Oct 9, 2023
@github-actions github-actions bot removed logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate labels Oct 9, 2023
@github-actions github-actions bot removed the sql SQL Planner label Oct 9, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mhilton and @berkaysynnada -- this change makes a lot of sense to me

@@ -833,15 +833,15 @@ impl ScalarValue {
DataType::Interval(IntervalUnit::MonthDayNano) => {
ScalarValue::IntervalMonthDayNano(Some(0))
}
DataType::Duration(TimeUnit::Second) => ScalarValue::DurationSecond(None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@alamb
Copy link
Contributor

alamb commented Oct 9, 2023

I took the liberty of merging the comments to this PR and merging from main.

@alamb alamb merged commit 96561a4 into apache:main Oct 9, 2023
@andygrove andygrove added the enhancement New feature or request label Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interval arithmetic broken for Timestamp(..) types using comparison operators
4 participants