Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8View column produced incorrect result in a natural join query (SQLancer-NoREC) #12468

Closed
2010YOUY01 opened this issue Sep 15, 2024 · 3 comments · Fixed by #13115
Closed
Assignees
Labels
bug Something isn't working

Comments

@2010YOUY01
Copy link
Contributor

Describe the bug

See the reproducer in datafusion-cli (compiled from the latest main, commit 468a5a8)

table1 and table1_stringview are two tables with logically equivalent data, but with different physical in-memory representations for string column, so given a random query, if we swap tables (table1 <-> table1_stringview), the mutated query should generate the same query result, but the inconsistency has been found:

ataFusion CLI v42.0.0
> create table table1(v1 string);
insert into table1 values('foo'), (NULL);
create table table1_stringview as select arrow_cast(v1, 'Utf8View') as v1 from table1;
0 row(s) fetched.
Elapsed 0.068 seconds.

+-------+
| count |
+-------+
| 2     |
+-------+
1 row(s) fetched.
Elapsed 0.044 seconds.

0 row(s) fetched.
Elapsed 0.014 seconds.

> select * from table1 as t1 natural join table1 as t2;
+-----+
| v1  |
+-----+
| foo |
+-----+
1 row(s) fetched.
Elapsed 0.035 seconds.

> select * from table1 as t1 natural join table1_stringview as t2;
+-----+
| v1  |
+-----+
| foo |
|     |
| foo |
|     |
+-----+
4 row(s) fetched.
Elapsed 0.006 seconds.

To Reproduce

No response

Expected behavior

No response

Additional context

Found by SQLancer #11030

@2010YOUY01 2010YOUY01 added the bug Something isn't working label Sep 15, 2024
@demetribu
Copy link
Contributor

take

@demetribu
Copy link
Contributor

demetribu commented Oct 21, 2024

Problem still exists. Resuming work on this.

JoinConstraint::Natural => {

@demetribu
Copy link
Contributor

The query analyzer coerces types as expected.

Projection: t1.v1
  Inner Join: Using CAST(t1.v1 AS Utf8View) = t2.v1
    SubqueryAlias: t1
      TableScan: table1 projection=[v1]
    SubqueryAlias: t2
      TableScan: table1_stringview projection=[v1]

However, the EliminateCrossJoin optimizer rule converts this inner join into a cross join for some reason, which seems to be incorrect.

impl OptimizerRule for EliminateCrossJoin {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants