-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(7181): provide slicing of CursorValues #7912
Conversation
cdcdf6f
to
6cd1b82
Compare
…ests to add CursorValue implementations
I plan to review this tomorrow morning |
For any questions on the big picture design, referred to the diagrams here. Note that^^ is a drafted followup PR. |
I wonder if you've thought about storing the slice in the BatchCursor, as opposed to pushing it down to this level. That way the logic would only exist once, as opposed to once for every CursorValues? https://github.com/apache/arrow-datafusion/pull/7842/files#diff-1eb407bf7bf6bcc6115e66c6a0197bba129c90328b9d6cd00a5e1eb5e74daa85R103 It could be a stupid idea though... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wiedld -- I think this looks really nice. It is both well documented as well as very thoroughly tested.
I have some suggestions on improvements to the documentation and tests that I think we should consider prior to merging, but I don't think they are strictly necessary which is why I am approving now.
Thanks again
} | ||
|
||
#[test] | ||
fn test_slice_primitive_nulls_first() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for also testing with nulls_first
/ nulls_last
? I didn't see that the semantics of slice
differ based on the that setting and thus I don't think these tests add much/any new coverage
I suggest either making a loop in test_slice_primitive
that uses the different SortOptions
ensuring they produce the same result or just removing these two tests
// hello == hello | ||
assert_eq!(a.cmp(&b), Ordering::Equal); | ||
|
||
// advance A. slice B full length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for these comments 👨🍳 👌
We should probably confirm this doesn't regress the current sort performance, e.g. by introducing an additional memory indirection when comparing cursors, but otherwise if @alamb is happy lets just move forwards with this |
Run with the application of this slicing code (a single composable merge node) branch here. gcp c3d-standard-8-lssd debian-11 ![]() There are further confirmation steps, as well as hypotheses, as to why this^^ is found. I'll post updates after. |
I think we are waiting on some more measurement confirmation that this PR doesn't regress performance. Marking it as draft as we do that so we don't accidentally merge this |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
I think we should close this @alamb , since it's not a priority at the moment. And whenever we circle back, there will be a very large diff (due to changes). Ok to close? |
Always good with me to close PRs (we can always reopen them later if needed) Thanks @wiedld |
Which issue does this PR close?
Part of #7181 .
Rationale for this change
In the cascaded merge tree, partially sorted record batches are yielded per merge node.
Therefore, we need the ability to yield a partial of these representative values (a.k.a. the
CursorValues
).What changes are included in this PR?
Slicing the
CursorValues
.Add tests to demonstrate use with
Cursor::new(sliced_cursor_values)
.Have test coverage include different data types.
Are these changes tested?
Yes, tests are added.
Are there any user-facing changes?
No.
CursorValues
are currently an internal construct.