Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate expressions after type coercion #3444

Merged
merged 12 commits into from
Sep 12, 2022

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Sep 11, 2022

Which issue does this PR close?

Closes #3431

Rationale for this change

See issue

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Sep 11, 2022
@github-actions github-actions bot added the core Core DataFusion crate label Sep 11, 2022
@andygrove
Copy link
Member

Thanks @Dandandan. This looks like a great improvement.

@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2022

Codecov Report

Merging #3444 (96334db) into master (c5c1dae) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3444      +/-   ##
==========================================
+ Coverage   85.68%   85.69%   +0.01%     
==========================================
  Files         298      298              
  Lines       54645    54667      +22     
==========================================
+ Hits        46820    46846      +26     
+ Misses       7825     7821       -4     
Impacted Files Coverage Δ
datafusion/core/tests/sql/aggregates.rs 99.37% <ø> (ø)
datafusion/core/tests/sql/decimal.rs 100.00% <ø> (ø)
datafusion/core/tests/sql/explain_analyze.rs 83.87% <ø> (ø)
datafusion/core/tests/sql/subqueries.rs 94.95% <ø> (ø)
datafusion/optimizer/src/type_coercion.rs 99.04% <100.00%> (+0.07%) ⬆️
datafusion/core/src/physical_plan/metrics/value.rs 86.93% <0.00%> (-0.51%) ⬇️
datafusion/core/tests/sql/select.rs 99.78% <0.00%> (+<0.01%) ⬆️
datafusion/sql/src/planner.rs 80.94% <0.00%> (+0.05%) ⬆️
datafusion/common/src/scalar.rs 85.12% <0.00%> (+0.06%) ⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

"+--------------+-------------------------+-------------------------+-------------------------+",
"| 1.5 | 2.5 | 3.5 | 2.5 |",
"+--------------+-------------------------+-------------------------+-------------------------+",
"+--------------+---------------------------+---------------------------+---------------------------+",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have the comments about the header of the expr.
The input sql is AGG(C1) + 1, 1 is the int64 data type, but the header is convert to float after casted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have the method to make the header consistent, and it can be changed with the changes of the optimizer plan.
cc @andygrove @alamb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a concern I also have for a longer time and had a PR open once.

One approach would be to add an alias for every unnamed expression based on the original query SQL or expression.
This would avoid having the column names changed by the optimizers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of adding an alias once (maybe as the initial optimizer pass?)

I am not sure how valuable adding the types in the column names is in general, to be honest. I wouldn't mind if rather than Int(1) this was simply rendered 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of adding an alias once (maybe as the initial optimizer pass?)

I am not sure how valuable adding the types in the column names is in general, to be honest. I wouldn't mind if rather than Int(1) this was simply rendered 1

Do you have plan or a draft pr for that? @Dandandan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps @Dandandan was referring to #280 / #279

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, we can give those a second life 🎉

I had some concerns with the PR, but I believe it is still a big improvement over the current state of things.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks like a good improvement to me. Perhaps we can file a follow on ticket for the column renaming?

"+--------------+-------------------------+-------------------------+-------------------------+",
"| 1.5 | 2.5 | 3.5 | 2.5 |",
"+--------------+-------------------------+-------------------------+-------------------------+",
"+--------------+---------------------------+---------------------------+---------------------------+",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of adding an alias once (maybe as the initial optimizer pass?)

I am not sure how valuable adding the types in the column names is in general, to be honest. I wouldn't mind if rather than Int(1) this was simply rendered 1

@@ -653,7 +653,7 @@ order by
let expected = "\
Sort: #revenue DESC NULLS FIRST\
\n Projection: #customer.c_custkey, #customer.c_name, #SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount) AS revenue, #customer.c_acctbal, #nation.n_name, #customer.c_address, #customer.c_phone, #customer.c_comment\
\n Aggregate: groupBy=[[#customer.c_custkey, #customer.c_name, #customer.c_acctbal, #customer.c_phone, #nation.n_name, #customer.c_address, #customer.c_comment]], aggr=[[SUM(#lineitem.l_extendedprice * CAST(Int64(1) AS Float64) - #lineitem.l_discount)]]\
\n Aggregate: groupBy=[[#customer.c_custkey, #customer.c_name, #customer.c_acctbal, #customer.c_phone, #nation.n_name, #customer.c_address, #customer.c_comment]], aggr=[[SUM(#lineitem.l_extendedprice * Float64(1) - #lineitem.l_discount)]]\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice

@alamb
Copy link
Contributor

alamb commented Sep 12, 2022

This PR appears to have some conflicts now

@andygrove
Copy link
Member

@Dandandan This now needs a rebase

@Dandandan Dandandan merged commit f48a997 into apache:master Sep 12, 2022
@ursabot
Copy link

ursabot commented Sep 12, 2022

Benchmark runs are scheduled for baseline = 97b3a4b and contender = f48a997. f48a997 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluate expressions after type coercion
6 participants