Display of integers without raw pointers and without overflowing_literals #135265

pascaldekloe · 2025-01-08T18:58:01Z

The benchmarks as is measure formatting speed of literals. The first commit black_box-es input to simulate runtime speed instead.

The second commit replaces unsafe pointer optimizations with plain array indices. The performance is equivalent on Apple M1. Needs peer review on Intel.

Happy to do the 128-bit version too if such change is welcome.

rustbot · 2025-01-08T18:58:10Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Noratrieb (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

tgross35 · 2025-01-08T21:43:13Z

r? tgross35 since I'm doing some work in this area anyway

You should cd src/etc/test-float-parse and run cargo +stage0 run --release, those tests are technically for parsing but do invoke the print routines quite a bit. I'll take a closer look at the changes soon.

pascaldekloe · 2025-01-08T21:44:37Z

No problem if this one gets rejected in favour of anything else @tgross35. I was just pleased to see the compiler doing this well.

tgross35 · 2025-01-08T21:46:49Z

You should cd src/etc/test-float-parse and run cargo +stage0 run --release, those tests are technically for parsing but do invoke the print routines quite a bit. I'll take a closer look at the changes soon.

Actually ignore this, I was thinking this was in flt2dec and not integers.

No problem if this one gets rejected in favour of anything else @tgross35. I was just pleased to see the compiler doing this well.

No conflict :) Replacing unsafe code is always nice if there are no drawbacks.

tgross35 · 2025-01-08T22:32:26Z

It looks like there may be some small regressions here, the new code doesn't elide a couple calls to panic_bounds_check https://rust.godbolt.org/z/hjqhqsnbb. You could probably play around on godbolt a bit to figure out where this is coming from, a call to hint::assert_unchecked in the right place would likely make this disappear.

Soon it will be possible to write_copy_of_slice on the MaybeUninit slices, I don't expect this to do anything for optimization but it would allow keeping the "write two bytes at a time" feature from the original #129259.

pascaldekloe · 2025-01-11T14:58:19Z

What a gem that Compiler Explorer is @tgross35. I think the assembly looks good now.

B.t.w., I tried bit shifting entries from the lookup table into a 64-bit register and write per 8 digits at once. The compiler "optimizes" such intermediate register away, and it goes for 16-bit copies instead. Maybe those small writes are actually faster. I don't know. 😂

pascaldekloe · 2025-01-12T19:33:58Z

I found a way without compiler hinting which is even faster than the unsafe code as is. It requires buffers of an even byte-size. Needs a little more work for all integer sizes.

https://github.com/pascaldekloe/b10/blob/e7d9813c1f74dc9137216f1b271303fb056e90ad/src/lib.rs#L1723

tgross35 · 2025-01-12T23:30:35Z

What a gem that Compiler Explorer is @tgross35.

I am by no means a perf expert but I still can't imagine chasing down optimizations without it :)

I found a way without compiler hinting which is even faster than the unsafe code as is. It requires buffers of an even byte-size. Needs a little more work for all integer sizes.

https://github.com/pascaldekloe/b10/blob/e7d9813c1f74dc9137216f1b271303fb056e90ad/src/lib.rs#L1723

Are you planning to update the PR with something like this? At a surface level the patch seems pretty reasonable but I didn't look too deep yet, I'll hold off on reviewing if you have something faster coming.

pascaldekloe · 2025-01-13T00:50:14Z

Yes, will give it a try @tgross35. Same surprise here as you're having. 🙂

tgross35 · 2025-01-13T06:24:23Z

@rustbot author

(just comment @rustbot review to update the labels when this is ready for a look)

pascaldekloe · 2025-01-13T18:13:30Z

The speed gain is because (1) the loop decision no longer depends on the digit calculation, (2) it produces fewer instructions, and (3) it no longer needs a check for special case zero, which has a "significant" leading zero.

Unfortunately the results are less optimal for the two benchmark cases in the core library: either zero or maximum digits in a loop. Apparently the branch prediction benefit does not outweight the per-4-digit tweak in such scenario.

So now we have the explicit assertions as hint assertions too. Can you have a look @tgross35? @rustbot review

https://rust.godbolt.org/z/h97MacKcc

bors · 2025-02-03T04:47:00Z

⌛ Testing commit 821bc70 with merge 60f48c4...

bors · 2025-02-03T06:34:30Z

💔 Test failed - checks-actions

library/core/src/fmt/num.rs

…gers

tgross35 · 2025-02-03T20:00:39Z

@bors r=tgross35,ChrisDenton

bors · 2025-02-03T20:00:42Z

📌 Commit ebeaf2e has been approved by tgross35,ChrisDenton

It is now in the queue for this repository.

bors · 2025-02-04T09:15:57Z

⌛ Testing commit ebeaf2e with merge 019fc4d...

bors · 2025-02-04T12:00:42Z

☀️ Test successful - checks-actions
Approved by: tgross35,ChrisDenton
Pushing 019fc4d to master...

rust-timer · 2025-02-04T13:17:46Z

Finished benchmarking commit (019fc4d): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-0.6%, -0.5%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.5%	[-0.6%, -0.5%]	2

Max RSS (memory usage)

Results (secondary 2.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.7%	[2.7%, 2.7%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results (primary 1.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[1.0%, 1.0%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.0%	[1.0%, 1.0%]	1

Binary size

Results (primary -0.2%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.4%, -0.0%]	12
Improvements ✅ (secondary)	-0.1%	[-0.1%, -0.1%]	38
All ❌✅ (primary)	-0.2%	[-0.4%, -0.0%]	12

Bootstrap: 778.131s -> 778.858s (0.09%)
Artifact size: 328.78 MiB -> 328.82 MiB (0.01%)

- Regression for delayed UB for slices: rust-lang/rust#135265

Upgrade toolchain to 2/10. I **highly recommend** reviewing this PR commit-by-commit. The description in each commit message links to the upstream PRs that prompted those particular changes. ## Callouts - 2/1 had a lot of formatting changes. I split the commits for that day into formatting changes and functionality changes accordingly. - 2/5 introduced a regression in our delayed UB instrumentation, so I made a new fixme test. See #3881 for details. ## Culprit PRs: rust-lang/rust#134424 rust-lang/rust#130514 rust-lang/rust#135748 rust-lang/rust#136590 rust-lang/rust#135318 rust-lang/rust#135265 rust-lang/rust@bcb8565 rust-lang/rust#136471 rust-lang/rust#136645 Resolves #3863 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

rustbot assigned Noratrieb Jan 8, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 8, 2025

pascaldekloe changed the title ~~Fmt int speed~~ Display of integers without unsafe pointer Jan 8, 2025

rustbot assigned tgross35 and unassigned Noratrieb Jan 8, 2025

tgross35 changed the title ~~Display of integers without unsafe pointer~~ Display of integers without raw pointer Jan 10, 2025

tgross35 changed the title ~~Display of integers without raw pointer~~ Display of integers without raw pointers Jan 10, 2025

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 13, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 13, 2025

This comment has been minimized.

Sign in to view

rustbot added has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 14, 2025

This comment has been minimized.

Sign in to view

pascaldekloe force-pushed the fmt-int-speed branch from c8893fa to f0a5e85 Compare January 14, 2025 14:34

rustbot removed has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 14, 2025

This comment has been minimized.

Sign in to view

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 3, 2025

tgross35 requested changes Feb 3, 2025

View reviewed changes

library/core/src/fmt/num.rs Outdated Show resolved Hide resolved

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 3, 2025

pascaldekloe force-pushed the fmt-int-speed branch from 821bc70 to e863cbd Compare February 3, 2025 14:43

no unsafe pointer and no overflowing_literals in fmt::Display of inte…

ebeaf2e

…gers

pascaldekloe force-pushed the fmt-int-speed branch from e863cbd to ebeaf2e Compare February 3, 2025 16:44

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 3, 2025

workingjubilee mentioned this pull request Feb 3, 2025

Tracking Issue for spurious mingw CI failures #133958

Open

workingjubilee added the CI-spurious-fail-mingw CI spurious failure: target env mingw label Feb 3, 2025

bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 4, 2025

bors merged commit 019fc4d into rust-lang:master Feb 4, 2025
7 checks passed

rustbot added this to the 1.86.0 milestone Feb 4, 2025

bors mentioned this pull request Feb 4, 2025

Optimize ToString implementation for integers #136264

Open

pascaldekloe mentioned this pull request Feb 5, 2025

Faster fmt::Display of 128-bit integers, without unsafe pointer #136594

Open

carolynzech mentioned this pull request Feb 10, 2025

Delayed UB instrumentation regression: slices model-checking/kani#3881

Open

carolynzech added a commit to carolynzech/kani that referenced this pull request Feb 10, 2025

Update toolchain to 2/5

09e2283

- Regression for delayed UB for slices: rust-lang/rust#135265

carolynzech mentioned this pull request Feb 10, 2025

Upgrade toolchain to 2/10 model-checking/kani#3883

Merged

carolynzech added a commit to carolynzech/kani that referenced this pull request Feb 10, 2025

Update toolchain to 2/5

318a4e6

- Regression for delayed UB for slices: rust-lang/rust#135265

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display of integers without raw pointers and without overflowing_literals #135265

Display of integers without raw pointers and without overflowing_literals #135265

pascaldekloe commented Jan 8, 2025 •

edited by workingjubilee

Loading

rustbot commented Jan 8, 2025

tgross35 commented Jan 8, 2025

pascaldekloe commented Jan 8, 2025

tgross35 commented Jan 8, 2025

tgross35 commented Jan 8, 2025

pascaldekloe commented Jan 11, 2025

pascaldekloe commented Jan 12, 2025

tgross35 commented Jan 12, 2025

pascaldekloe commented Jan 13, 2025

tgross35 commented Jan 13, 2025

pascaldekloe commented Jan 13, 2025

This comment has been minimized.

This comment has been minimized.

bors commented Feb 3, 2025

This comment has been minimized.

bors commented Feb 3, 2025

tgross35 commented Feb 3, 2025

bors commented Feb 3, 2025

bors commented Feb 4, 2025

bors commented Feb 4, 2025

rust-timer commented Feb 4, 2025

Display of integers without raw pointers and without overflowing_literals #135265

Display of integers without raw pointers and without overflowing_literals #135265

Conversation

pascaldekloe commented Jan 8, 2025 • edited by workingjubilee Loading

rustbot commented Jan 8, 2025

tgross35 commented Jan 8, 2025

pascaldekloe commented Jan 8, 2025

tgross35 commented Jan 8, 2025

tgross35 commented Jan 8, 2025

pascaldekloe commented Jan 11, 2025

pascaldekloe commented Jan 12, 2025

tgross35 commented Jan 12, 2025

pascaldekloe commented Jan 13, 2025

tgross35 commented Jan 13, 2025

pascaldekloe commented Jan 13, 2025

This comment has been minimized.

This comment has been minimized.

bors commented Feb 3, 2025

This comment has been minimized.

bors commented Feb 3, 2025

tgross35 commented Feb 3, 2025

bors commented Feb 3, 2025

bors commented Feb 4, 2025

bors commented Feb 4, 2025

rust-timer commented Feb 4, 2025

Overall result: ✅ improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

pascaldekloe commented Jan 8, 2025 •

edited by workingjubilee

Loading