Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

approx_percentile_cont_with_weight result type changed from datafusion v39 to v40 #11874

Closed
Michael-J-Ward opened this issue Aug 7, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Michael-J-Ward
Copy link
Contributor

Describe the bug

In datafusion:v39.0.0, calling approx_percentile_cont_with_weight on an integer column produced an integer result.

In datafusion:v40.0.0, calling approx_percentile_cont_with_weight on an integer column produces a float result.

To Reproduce

v39.0.0 behavior

Setup

git checkout 40.0.0
git log --oneline -n 1
cargo run
6a4a280e3 (HEAD, tag: 39.0.0-rc1, tag: 39.0.0, upstream/branch-39) remove unused file
DataFusion CLI v39.0.0
create table foo(x int);
insert into foo values (1), (2), (3);
select arrow_typeof(approx_percentile_cont_with_weight("x", 0.6, 0.5)), approx_percentile_cont_with_weight("x", 0.6, 0.5) from foo;

Output

+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
| arrow_typeof(APPROX_PERCENTILE_CONT_WITH_WEIGHT(foo.x,Float64(0.6),Float64(0.5))) | APPROX_PERCENTILE_CONT_WITH_WEIGHT(foo.x,Float64(0.6),Float64(0.5)) |
+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
| Int32                                                                             | 3                                                                   |
+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.010 seconds.

v40.0.0 behavior

Setup

git checkout 40.0.0
git log --oneline -n 1
cargo run
4cae81363 (HEAD, tag: 40.0.0-rc1, tag: 40.0.0, upstream/branch-40) manually remove a reverted PR from the breaking change section
DataFusion CLI v40.0.0
create table foo(x int);
insert into foo values (1), (2), (3);
select arrow_typeof(approx_percentile_cont_with_weight("x", 0.6, 0.5)), approx_percentile_cont_with_weight("x", 0.6, 0.5) from foo;

Output

+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
| arrow_typeof(approx_percentile_cont_with_weight(foo.x,Float64(0.6),Float64(0.5))) | approx_percentile_cont_with_weight(foo.x,Float64(0.6),Float64(0.5)) |
+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
| Float64                                                                           | 3.0                                                                 |
+-----------------------------------------------------------------------------------+---------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.011 seconds.

Expected behavior

I would have expected the return type to remain consistent, though I am unsure what datafusion uses as the canonical implementation. SQL Server does look like it should return a float

Additional context

approx_percentile_cont_with_weight was converted to a UDAF between v39 and v40.

#10917

@Michael-J-Ward Michael-J-Ward added the bug Something isn't working label Aug 7, 2024
@jayzhan211
Copy link
Contributor

SQL Server does look like it should #10917

It seems like float is the correct type? If that, I don't think there is any further change required

@Michael-J-Ward
Copy link
Contributor Author

I would have expected the return type to remain consistent EDIT between datafusion v39 and v40

I only filed the issue because the behavior changed between datafusion releases as it was migrated to an UDAF.

If datafusion uses SQL Server's implementation as canonical, then I agree that v40 is correct and no action required.

@alamb
Copy link
Contributor

alamb commented Aug 8, 2024

I agree we can close this issue as "working as designed" -- thanks @Michael-J-Ward and @jayzhan211

@alamb alamb closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants