-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-naive implementation of VecDeque.append
#52553
Conversation
r? @shepmaster (rust_highfive has picked a reviewer for you, use r? to override) |
Which benchmarks did you use? Are there some existing or did you write your own? If you wrote new ones, it seems like those should be added (but I'm not 100% sure on that). Either way, it'd be good to see the benchmarks (and the raw data) to check that they are valid. |
src/liballoc/tests/vec_deque.rs
Outdated
a.drain(a_pop_front..); | ||
b.drain(..b_pop_back); | ||
b.drain(b_pop_front..); | ||
let checked = a.iter().chain(b.iter()).map(|&x| x).collect::<Vec<usize>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(|&x| x)
-> .cloned()
src/liballoc/tests/vec_deque.rs
Outdated
@@ -928,6 +928,60 @@ fn test_append() { | |||
assert_eq!(a.iter().cloned().collect::<Vec<_>>(), []); | |||
} | |||
|
|||
#[test] | |||
fn test_append_advanced() { | |||
fn check( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check what? Tests are a form of documentation. All tests "check" something, that's rather their purpose.
src/liballoc/tests/vec_deque.rs
Outdated
@@ -928,6 +928,60 @@ fn test_append() { | |||
assert_eq!(a.iter().cloned().collect::<Vec<_>>(), []); | |||
} | |||
|
|||
#[test] | |||
fn test_append_advanced() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"advanced" isn't really a great descriptor. What makes it advanced?
src/liballoc/tests/vec_deque.rs
Outdated
a_push_front: usize, | ||
a_pop_front: usize, | ||
b_push_front: usize, | ||
b_pop_front: usize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of argument list is about at the top of the list for "easiest to pass the wrong arguments". There's a huge mass of arguments, subtly intertwined, all of the same type.
Instead, WDYT about
#[derive(Debug, Default)]
struct SomeUsefulName {
a_push_back: usize,
a_pop_back: usize,
b_push_back: usize,
b_pop_back: usize,
a_push_front: usize,
a_pop_front: usize,
b_push_front: usize,
b_pop_front: usize
}
check(SomeUsefulName { a_push, a_pop, b_push, b_pop, ..SomeUsefulName::default()});
src/liballoc/tests/vec_deque.rs
Outdated
assert_eq!(a, checked); | ||
assert!(b.is_empty()); | ||
} | ||
for a_push in 0..17 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 17. Preferably there's a constant with an evocative name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even 0..=SOME_CONSTANT
if 16 happens to be the reason...
|
||
// When minimizing the amount of calls to `copy_part`, there are | ||
// 6 different cases to handle. Whether src and/or dst wrap are 4 | ||
// combinations and there are 3 distinct cases when they both wrap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this the first time, I assumed you mathed wrong. Having the number 4
and then the number 3
made me think that there would be 7 cases. On rereading it, I see my mistake. I don't have a suggestion for how to reword, but maybe someone will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, what benefit does the math in this comment bring?
// 3 3 H 1 1 1 2 | ||
let src_2 = dst_before_wrap - src_before_wrap; | ||
let dst_start_2 = dst_start_1 + src_before_wrap; | ||
let src_3 = src_total - dst_before_wrap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of the _1
, _2
, _3
suffixes. How do I know to use _1
vs _2
? It's nice that there's a picture, but it's a shame that the code cannot stand on its own without it, especially knowing how code and comments frequently diverge.
self.extend(other.drain(..)); | ||
// Copy from src[i1..i1 + len] to dst[i2..i2 + len]. | ||
// Does not check if the ranges are valid. | ||
unsafe fn copy_part<T>(i1: usize, i2: usize, len: usize, src: &[T], dst: &mut [T]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing why this part needs to be unsafe. Can't we build slices and use [T]::copy_from_slice
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, because T
doesn't necessarily implement Copy
, of course. However, it feels odd to "reimplement" a half-slice on top of a slice due to the i1
/ i2
.
i1
and i2
are actually... offsets(?) into src
and dst
respectively? Those should be renamed to at least have src
and dst
in there.
// 6 different cases to handle. Whether src and/or dst wrap are 4 | ||
// combinations and there are 3 distinct cases when they both wrap. | ||
// 6 = 3 + 1 + 1 + 1 | ||
match (src_wraps, dst_wraps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pure speculation, but I wonder if there could be a method to encapsulate this logic, something along the lines of
fn source_parts(&self) -> (&[T], &[T]); // might exist as `as_slices`
unsafe fn destination_parts(&mut self) -> (&mut [T], &mut [T]);
This code will still need to exist and handle the same cases to minimize the copies.
} | ||
} | ||
}; | ||
other.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably, until this point we are in a Bad State where a given element actually exists twice, once in both collections, right? I believe that means the panic safety of this function needs to be documented to help prevent the next person to touching it from shooting themselves in the foot.
If that's important, there should be comments in the test showcasing which part(s) of the test exercise which conditions to avoid losing that coverage. |
OK, I've pushed some changes that address most things you commented on. Instead of working with offsets in
This one. I didn't use the standard bench harness as the VecDeques have to be cloned in the loop and that would force the bench to measure |
Ping from triage @shepmaster! This PR needs your review. |
Thanks for putting up with my feedback! Now that I've done what I can, I'll pass it off to a better member of the team to actually make a decision! r? @SimonSapin |
I’ll need more time than I have right now to properly review the code (especially because [[bench]]
name = "collectionsbenches"
path = "../liballoc/benches/lib.rs" Below it, add this: [[bench]]
name = "vec_deque_append_bench"
path = "../liballoc/benches/vec_deque_append.rs"
harness = false ( Then you should be able to run it with something like @rust-lang/infra Is there any automation that parses the ouptut of |
Nothing today parses the benchmark output. |
@SimonSapin The benchmark has been added. The only changes I did were to make it more similar to the default bench harness (use the median instead of the mean and force the use of nanoseconds). |
let (before, after) = buf.split_at_mut(head); | ||
(after, before) | ||
} else { | ||
RingSlices::ring_slices(buf, tail, head) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth a comment explaining why using ring_slices
with the arguments swapped provides the desired behavior.
// naive impl | ||
self.extend(other.drain(..)); | ||
// Copies all values from `src_slice` to the start of `dst_slice`. | ||
unsafe fn copy_whole_slice<T>(src_slice: &[T], dst_slice: &mut [T]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is like [T]::copy_from_slice
but without T: Copy
? Then it is up to callers to be careful to not "duplicate" items and have for example some heap memory be double-freed.
… in fact this code ends up calling other.clear()
which will incorrectly drop the items that have just been copied. Instead it should probably do something like other.tail = 0; other.head = 0
or other.tail = other.head
.
Consider modifying the test to have not just T = usize
, but a type with a Drop
impl that does something non-trivial, ideally detect double drops. Perhaps T = Box<usize>
is enough, if you can verify that CI runs tests with ASAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new test that checks for double drops similar to the Vec tests has been added.
// 6. `src` and `dst` are discontiguous | ||
// + dst_high is the same size as src_high | ||
let src_contiguous = src_low.is_empty(); | ||
let dst_contiguous = dst_high.len() >= src_total; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took me a while to figure out.
The wording of the comment above suggests that “dst
is contiguous” is a property of dst
alone (namely that the pair of empty/unused slices are in fact one), similar to “src
is contiguous”.
However the definition of this dst_contiguous
value is not that. Instead, it is “we will only need to write into a contiguous portion of dst
”. This does not match the comment or the name of the variable. Consider rewording both.
// it is important we clear the old values from `other`... | ||
other.clear(); | ||
// and that we update `head` as the last step, making the values accessible in `self`. | ||
self.head = new_head; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With clear()
replaced per above comment there is no need to delay the write to self.head
, since there is no arbitrary destructor code being called, only shallow copies (moves). Inlining it in each 6 cases might be clearer. (The reset of other
can also be moved before the copies.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assignment of self.head
and the reset of other
both need to be done after dst and src are retrieved and the borrow checker complains if I do the assignments while they are in scope as self
and other
are borrowed.
💔 Test failed - status-travis |
Your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem. Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors retry |
⌛ Testing commit b063bd4 with merge 4ceb23dfca0457fb399a72b3f7526b54a83b7476... |
💔 Test failed - status-travis |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Non-naive implementation of `VecDeque.append` Replaces the old, simple implementation with a more manual (and **unsafe** 😱) one. I've added 1 more test and verified that it covers all 6 code paths in the function. This new implementation was about 60% faster than the old naive one when I tried benchmarking it.
☀️ Test successful - status-appveyor, status-travis |
…onSapin" This partially reverts commit d5b6b95, reversing changes made to 6b1ff19. Fixes rust-lang#53529. Cc: rust-lang#53564.
Reoptimize VecDeque::append ~Unfortunately, I don't know if these changes fix the unsoundness mentioned in #53529, so it is stil a WIP. This is also completely untested. The VecDeque code contains other unsound code: one example : [reading unitialized memory](https://play.rust-lang.org/?gist=6ff47551769af61fd8adc45c44010887&version=nightly&mode=release&edition=2015) (detected by MIRI), so I think this code will need a bigger refactor to make it clearer and safer.~ Note: this is based on #53571. r? @SimonSapin Cc: #53529 #52553 @yorickpeterse @jonas-schievink @Pazzaz @shepmaster.
Optimize VecDeque::append Optimize `VecDeque::append` to do unsafe copy rather than iterating through each element. On my `Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz`, the benchmark shows 37% improvements: ``` Master: custom-bench vec_deque_append 583164 ns/iter custom-bench vec_deque_append 550040 ns/iter Patched: custom-bench vec_deque_append 349204 ns/iter custom-bench vec_deque_append 368164 ns/iter ``` Additional notes on the context: this is the third attempt to implement a non-trivial version of `VecDeque::append`, the last two are reverted due to unsoundness or regression, see: - rust-lang#52553, reverted in rust-lang#53571 - rust-lang#53564, reverted in rust-lang#54851 Both cases are covered by existing tests. Signed-off-by: tabokie <[email protected]>
Replaces the old, simple implementation with a more manual (and unsafe 😱) one. I've added 1 more test and verified that it covers all 6 code paths in the function.
This new implementation was about 60% faster than the old naive one when I tried benchmarking it.