-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display of integers without raw pointers and without overflowing_literals #135265
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Noratrieb (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
r? tgross35 since I'm doing some work in this area anyway You should |
No problem if this one gets rejected in favour of anything else @tgross35. I was just pleased to see the compiler doing this well. |
Actually ignore this, I was thinking this was in
No conflict :) Replacing unsafe code is always nice if there are no drawbacks. |
It looks like there may be some small regressions here, the new code doesn't elide a couple calls to Soon it will be possible to |
What a gem that Compiler Explorer is @tgross35. I think the assembly looks good now. B.t.w., I tried bit shifting entries from the lookup table into a 64-bit register and write per 8 digits at once. The compiler "optimizes" such intermediate register away, and it goes for 16-bit copies instead. Maybe those small writes are actually faster. I don't know. 😂 |
I found a way without compiler hinting which is even faster than the unsafe code as is. It requires buffers of an even byte-size. Needs a little more work for all integer sizes. |
I am by no means a perf expert but I still can't imagine chasing down optimizations without it :)
Are you planning to update the PR with something like this? At a surface level the patch seems pretty reasonable but I didn't look too deep yet, I'll hold off on reviewing if you have something faster coming. |
Yes, will give it a try @tgross35. Same surprise here as you're having. 🙂 |
@rustbot author (just comment |
The speed gain is because (1) the loop decision no longer depends on the digit calculation, (2) it produces fewer instructions, and (3) it no longer needs a check for special case zero, which has a "significant" leading zero. Unfortunately the results are less optimal for the two benchmark cases in the core library: either zero or maximum digits in a loop. Apparently the branch prediction benefit does not outweight the per-4-digit tweak in such scenario. So now we have the explicit assertions as hint assertions too. Can you have a look @tgross35? @rustbot review |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
c8893fa
to
f0a5e85
Compare
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
821bc70
to
e863cbd
Compare
e863cbd
to
ebeaf2e
Compare
@bors r=tgross35,ChrisDenton |
☀️ Test successful - checks-actions |
Finished benchmarking commit (019fc4d): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (secondary 2.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.2%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 778.131s -> 778.858s (0.09%) |
- Regression for delayed UB for slices: rust-lang/rust#135265
- Regression for delayed UB for slices: rust-lang/rust#135265
Upgrade toolchain to 2/10. I **highly recommend** reviewing this PR commit-by-commit. The description in each commit message links to the upstream PRs that prompted those particular changes. ## Callouts - 2/1 had a lot of formatting changes. I split the commits for that day into formatting changes and functionality changes accordingly. - 2/5 introduced a regression in our delayed UB instrumentation, so I made a new fixme test. See #3881 for details. ## Culprit PRs: rust-lang/rust#134424 rust-lang/rust#130514 rust-lang/rust#135748 rust-lang/rust#136590 rust-lang/rust#135318 rust-lang/rust#135265 rust-lang/rust@bcb8565 rust-lang/rust#136471 rust-lang/rust#136645 Resolves #3863 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.
The benchmarks as is measure formatting speed of literals. The first commit
black_box
-es input to simulate runtime speed instead.The second commit replaces
unsafe
pointer optimizations with plain array indices. The performance is equivalent on Apple M1. Needs peer review on Intel.Happy to do the 128-bit version too if such change is welcome.