Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: date_add and date_sub functions #910

Merged
merged 18 commits into from
Sep 16, 2024
Merged

Conversation

mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Sep 4, 2024

Which issue does this PR close?

Does not close an individual PR, but addresses an item in #858.

Rationale for this change

What changes are included in this PR?

New DateAdd and DateSub functions in QueryPlanSerde. They have two arguments: a date and an int32 representing the number of days. DataFusion does not fully support date_add, so this PR introduces functions using the scalar UDF interface. These functions use DataFusion interfaces to apply Arrow kernels to perform the computation.

I also had to generate new q72 reference plans because it uses date_add.

How are these changes tested?

New test cases for date_add and date_sub, both scalar and array arguments. fix: ReusedExchangeExec + CometShuffleExchangeExec under QueryStageExec + should be CometRoot in CometExecSuite also exercises date_add. q72 also exercises date_add as well.

@mbutrovich mbutrovich marked this pull request as draft September 4, 2024 21:05
@mbutrovich mbutrovich changed the title date_add expression feat: date_add expression Sep 4, 2024
@mbutrovich
Copy link
Contributor Author

I'll drive into this test failure tomorrow. My assumption of the second expression for date_add always being a Literal does not appear sound.

Cause: org.apache.comet.CometNativeException: General execution error with reason: Not implemented: DateAdd { left: Some(Expr { expr_struct: Some(Literal(Literal { datatype: Some(DataType { type_id: Date, type_info: None }), is_null: false, value: Some(IntVal(0)) })) }), right: Some(Expr { expr_struct: Some(Cast(Cast { child: Some(Expr { expr_struct: Some(Remainder(Remainder { left: Some(Expr { expr_struct: Some(Bound(BoundReference { index: 0, datatype: Some(DataType { type_id: Int64, type_info: None }) })) }), right: Some(Expr { expr_struct: Some(If(IfExpr { if_expr: Some(Expr { expr_struct: Some(Eq(Equal { left: Some(Expr { expr_struct: Some(Literal(Literal { datatype: Some(DataType { type_id: Int64, type_info: None }), is_null: false, value: Some(LongVal(4)) })) }), right: Some(Expr { expr_struct: Some(Literal(Literal { datatype: Some(DataType { type_id: Int64, type_info: None }), is_null: false, value: Some(LongVal(0)) })) }) })) }), true_expr: Some(Expr { expr_struct: Some(Literal(Literal { datatype: Some(DataType { type_id: Int64, type_info: None }), is_null: true, value: None })) }), false_expr: Some(Expr { expr_struct: Some(Literal(Literal { datatype: Some(DataType { type_id: Int64, type_info: None }), is_null: false, value: Some(LongVal(4)) })) }) })) }), fail_on_error: false, return_type: Some(DataType { type_id: Int64, type_info: None }) })) }), datatype: Some(DataType { type_id: Int32, type_info: None }), timezone: "America/Los_Angeles", eval_mode: Legacy, allow_incompat: false })) }) }.

@mbutrovich mbutrovich marked this pull request as ready for review September 5, 2024 21:01
@mbutrovich mbutrovich changed the title feat: date_add expression feat: date_add function Sep 5, 2024
@mbutrovich
Copy link
Contributor Author

q72 comparison (only SF 1 locally), last two rows of each (using Comet Exec) are relevant. main branch:

TPCDS Snappy:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
q72                                                7047           7848        1133          2.2         459.1       1.0X
q72: Comet (Scan)                                  6824           7291         660          2.2         444.6       1.0X
q72: Comet (Scan, Exec)                           10816          10896         112          1.4         704.7       0.7X
q72: Comet (Exec)                                 10811          10959         209          1.4         704.4       0.7X

This PR:

TPCDS Snappy:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
q72                                                6848           6992         204          2.2         446.2       1.0X
q72: Comet (Scan)                                  6814           6819           7          2.3         444.0       1.0X
q72: Comet (Scan, Exec)                           10376          10442          93          1.5         676.1       0.7X
q72: Comet (Exec)                                 10472          10513          58          1.5         682.3       0.7X

@mbutrovich mbutrovich changed the title feat: date_add function feat: date_add and date_sub functions Sep 6, 2024
@mbutrovich mbutrovich requested a review from andygrove September 6, 2024 18:13
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit. Thanks @mbutrovich!

@mbutrovich
Copy link
Contributor Author

mbutrovich commented Sep 6, 2024

I added support for Int8 and Int16. I don't love this control flow. @viirya suggested macros, but I'm open to other ways to approach this.

@mbutrovich mbutrovich requested a review from viirya September 7, 2024 11:32
Comment on lines +561 to +562
}
macro_rules! array_date_arithmetic {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: style

Suggested change
}
macro_rules! array_date_arithmetic {
}
macro_rules! array_date_arithmetic {

@viirya viirya merged commit 4032129 into apache:main Sep 16, 2024
74 checks passed
@viirya
Copy link
Member

viirya commented Sep 16, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants