Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: array list values are leaked on nested unnest operators #10689

Merged
merged 2 commits into from
May 28, 2024

Conversation

duongcongtoai
Copy link
Contributor

Which issue does this PR close?

Closes #10672 .

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@duongcongtoai duongcongtoai changed the title Fix: array list values are leaked on nested unnest operators Fix: array list values are leaked on nested unnest operators May 27, 2024
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label May 27, 2024
// If there is only one list column to unnest and it doesn't contain any NULL lists,
// we can return the values array directly without any copying.
if typed_arrays.len() == 1 && typed_arrays[0].null_count() == 0 {
Ok(vec![typed_arrays[0].values().clone()])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will return the underlying low level array, which include the row indices filtered out by limit/offset

Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch ! Thanks @duongcongtoai

Another simple way to reproduce it is

select unnest(column1) from (select * from (values([1,2,3]), ([4,5,6])) limit 1);

I'm not sure if this is a bug in arrow-rs or by design. It only slices the offset data of the ListArray, without slicing the child data.

@duongcongtoai
Copy link
Contributor Author

Nice catch ! Thanks @duongcongtoai

Another simple way to reproduce it is

select unnest(column1) from (select * from (values([1,2,3]), ([4,5,6])) limit 1);

I'm not sure if this is a bug in arrow-rs or by design. It only slices the offset data of the ListArray, without slicing the child data.

I think current impl is correct, because the offset values after slicing is still offsetted based on original array. Else we have to do shifting for each offset elements like

Current buffers
            undearneath value: [1,?,3,?,5,6]
            nulls: [1,0,1,0,1,1]
            offset: [0,1,2,4,6]
            logical value: [1] [null] [3,null] [5,6]

Slice(2,2)
            logical values -> [3,null] [5,6]
            offset -> [2,4,6] << 2 = [0,2,4] 
            unearneath_value.slice(2:) = [3,?,5,6]
            nulls.slice(2:) = [1,0,1,1]

@jonahgao
Copy link
Member

I think current impl is correct, because the offset values after slicing is still offsetted based on original array. Else we have to do shifting for each offset elements like

I think the shifting operation should be done internally in arrow-rs if needed, as it depends on what the data in the values looks like. Moreover, we don't need to do shifting for FixedSizeListArray.

The current solution looks good to me.

@alamb alamb merged commit 9fba34d into apache:main May 28, 2024
23 checks passed
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…e#10689)

* chore: fix issue + add test

* simplify test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
3 participants