Add fmt size_hints #583

seanmonstar · 2015-01-14T06:56:02Z

Add a size_hint method to each of the fmt traits, allowing a buffer to allocate with the correct size before writing.

Rendered

cc @alexcrichton

erickt · 2015-01-14T18:30:20Z

I'm all for adding a hint for formatting. It would be nice in this RFC to mention unifying this with iterator size hints.

cc-ing @kballard and @thestinger, who I believe collaborated on adding Iterator::size_hint.

Does anyone know of anything using the Iterator::size_hint upper bound? I did a scan through the rust code base, and the only users I could find was other size_hint methods and tests. While semantically it'd be nice to have the lower and upper bounds of a collection, it's a bit of a shame we might be doing work that no one ever uses. Perhaps we should split things into a size_hint(&self) -> uint and a size_bounds(&self) -> (uint, Option<uint>)?

#46 is somewhat related to this RFC.

lilyball · 2015-01-14T19:10:23Z

I don't know offhand of anyone using the Iterator::size_hint upper bound. I have a recollection a discussion a while back in which someone claimed that, after looking through the rust source, there were no users of it. At the time I defended its existence with theoretical use-cases that wouldn't be found in the standard libraries but might be found in third-party code, such as providing progress estimates for an operation that processes the results of an iterator chain. Or perhaps more applicable, there may be some use in having an Extend implementation that allocates more than the lower bound ahead of time, with the expectation that the "true" iterator count lies somewhere between the lower and upper bounds.

All that said, if the size_hint for fmt is just used for the allocation created by fmt itself, then if it's only using the lower bound, there doesn't seem to be much use in having the upper bound. Or is there a use-case for calling this size_hint in third-party code? Alternatively, would the fmt module perhaps want to experiment with pre-allocating some size in between the lower and upper bounds? I'm thinking some simple rule like allocating min(lower_bound * 1.5, upper_bound).

lilyball · 2015-01-14T19:12:05Z

A drawback to this proposal: all manual implementations of fmt traits now have a second method to implement that must be kept in sync with the fmt() method.

Something to consider: A macro like write_size_hint!() that takes a format string and arguments, just like write!(), does, and generates a SizeHint from it. You'd need to keep the format string in sync with the write!() call but it would make implementing size_hint() easier in most cases.

lilyball · 2015-01-14T19:14:04Z

text/0000-fmt-size-hint.md

+        SizeHint {
+            min: self.min + other.min,
+            max: match (self.max, other.max) {
+                (Some(left), Some(right)) => Some(left + right),


This implementation doesn't handle overflow. It needs to look something like

SizeHint { min: self.min.saturating_add(other.min), max: if let (Some(left), Some(right)) = (self.max, other.max) { Some(left.checked_add(right)) } else { None } }

Ah yes, will adjust.

lilyball · 2015-01-14T19:15:35Z

Offhand, I am in favor of making formatting strings more efficient, especially if that makes "foo".to_string() more efficient.

seanmonstar · 2015-01-14T19:19:50Z

That's exactly what I had thought with the range: if the range is large, it
might perform faster picking somewhere in the middle.

I looked at Iters size_hint for design, but they're a little different.
Iters hint changes depending on the position of the Iter. Also, fmt can
have nested calls, so those must be added.

@kballard perhaps debug_asserts could be added to ensure at the end that
the string is within the bounds, to catch when you forget to change both?

On Wed, Jan 14, 2015, 11:12 AM Kevin Ballard [email protected]
wrote:

A drawback to this proposal: all manual implementations of fmt traits now
have a second method to implement that must be kept in sync with the fmt()
method.

Something to consider: A macro like write_size_hint!() that takes a
format string and arguments, just like write!(), does, and generates a
SizeHint from it. You'd need to keep the format string in sync with the
write!() call but it would make implementing size_hint() easier in most
cases.

—
Reply to this email directly or view it on GitHub
#583 (comment).

alexcrichton · 2015-01-14T20:00:02Z

text/0000-fmt-size-hint.md

+}
+```
+
+Add a `SizeHint` type, with named properties, instead of using tuple indexing. Include an `Add` implementation for `SizeHint`, so they can be easily added together from nested properties.


This is an interesting deviation from the size hints of iterators. I would personally expect the two to have the same signature, and this may want to at least discuss the discrepancy between the two return values.

It started because I wanted to implement Add, and it felt odd to impl on such a generic tuple.

lilyball · 2015-01-15T09:02:32Z

I've got a different approach that does not change the API of Show or String (or anything else that third-party code is expected to implement). It's focused around optimizing the case of "foo".to_string(). The approach here is to add a parameter to std::fmt::Writer::write_str() called more_coming that is true if the caller believes it will be calling write_str() again. The flag is tracked in the Formatter struct so calls to write!() inside an implementation of Show/String don't incorrectly claim there will be no writes if their caller still has more to write. Ultimately, the flag is used in the implementation of fmt::Writer on String to use reserve_exact() if nothing more is expected to be written.

With this approach I'm seeing similar numbers for your to_string() benchmarks that I get with your code (once I re-enable the call to shrink_to_fit() that you commented out in your code). The downside is the "nested" benchmark ends up about 3% slower with my code compared to the stdlib version, although I haven't tried to figure out why yet.

test test::bench_long          ... bench:        64 ns/iter (+/- 3)
test test::bench_long_memcpy   ... bench:        36 ns/iter (+/- 3)
test test::bench_long_stdlib   ... bench:       110 ns/iter (+/- 5)
test test::bench_med           ... bench:        53 ns/iter (+/- 3)
test test::bench_med_memcpy    ... bench:        27 ns/iter (+/- 2)
test test::bench_med_stdlib    ... bench:        96 ns/iter (+/- 5)
test test::bench_nested        ... bench:       211 ns/iter (+/- 11)
test test::bench_nested_stdlib ... bench:       204 ns/iter (+/- 21)
test test::bench_short         ... bench:        53 ns/iter (+/- 3)
test test::bench_short_memcpy  ... bench:        28 ns/iter (+/- 1)
test test::bench_short_stdlib  ... bench:        76 ns/iter (+/- 1)

…nd an alternative

seanmonstar · 2015-01-15T20:25:30Z

@kballard @alexcrichton I've updated the RFC with your comments.

mahkoh · 2015-01-16T15:26:20Z

Please benchmark number formatting. rust-lang/rust#19218

alexcrichton · 2015-01-16T18:40:51Z

text/0000-fmt-size-hint.md

+    value: &'a Void,
+    formatter: fn(&Void, &mut Formatter) -> Result,
+    hint: fn(&Void) -> SizeHint,
+}


It may be worth profiling the impact of this representation as it's inflating the size from 2 words to 3 words. Formatting is normally not necessarily on hot code paths, but it does affect code size and stack size and we've had to optimize it in the past. In theory an Argument is just a trait object so the formatter/hint pair could point to a "vtable" which could just be constructed manually instead of as a trait object itself.

Regardless though, it's a pretty minor point, just something to think about :)

I was thinking about this as well. Currently, any function can be passed here, but if we didn't want that flexibility, we could create an enum Format { Show, String, LowerHex, ... } and change Argument to be struct Argument { formatter: Format, value: &Void }.

At some point we do need to force rustc to monomorphize an implementation of Show as how would you actually call the fmt function given just formatter and value? (e.g. a function pointer needs to be somewhere)

I imagined being able to look them up from the enum:

impl Format { fn get_fns(&self) -> (fn(&Void, &mut Formatter) -> Result, fn(&Void) -> SizeHint) { match *self { Format::Show => (Show::fmt, Show::size_hint), // ... } } }

You'll need type inference to drive inference for the Self type of Show::fmt:

use std::fmt::Show; fn main() { let f = Show::fmt; }

foo.rs:4:13: 4:22 error: type annotations required: cannot resolve `_ : core::fmt::Show` foo.rs:4 let f = Show::fmt; ^~~~~~~~~ foo.rs:4:13: 4:22 note: required by `core::fmt::Show::fmt` foo.rs:4 let f = Show::fmt; ^~~~~~~~~ error: aborting due to previous error

alexcrichton · 2015-01-16T18:43:27Z

Thanks for the additions @seanmonstar! Could you also add a part explicitly saying that everything will be #[unstable] to start out with? In theory that should allow us to immediately improve "foo".to_string() while preventing locking us in to SizeHint immediately as defining your own size_hint would be unstable by default.

reem · 2015-01-16T20:07:20Z

I've also wondered if we could just add length hinting methods to fmt::Formatter, like hint_size(&mut self, usize) which avoids changing the trait at all.

seanmonstar · 2015-01-16T20:30:33Z

@reem By the time you can call that, the buffer has already been allocated. And if the object has properties that are also going to be formatted, those properties can't tell the Formatter their hint until part way through the formatting.

seanmonstar · 2015-01-16T20:42:15Z

@alexcrichton I added unstable attributes to the struct and method.

seanmonstar · 2015-01-17T21:34:21Z

text/0000-fmt-size-hint.md

+```rust
+impl<T: fmt::String> ToString for T {
+    fn to_string(&self) -> String {
+        format!("{}", self)


I started implementing this, and ran into that the format macro is defined in libstd, and this trait is in libcollections. The reason it's in std is because it needs String to use as a buffer.

What if I moved this macro lib collections, and had it call String::format instead of std::fmt::format?

I think it's fine to define the format! macro in libcollections and then use macro_reexport in the standard library (like it does for vec! already)

That is exactly my plan, c.f. rust-lang/rust#21912 and #810.

pnkfelix · 2015-02-05T13:00:58Z

@seanmonstar

Could you add "all manual implementations of fmt traits now have a second method to implement that must be kept in sync with the fmt() method." to the Drawbacks section, as mentioned by @kballard above? (At least, as I understand it, your choice is either to inherit the default impl, which is always correct but maximally inefficient, or to maintain this second method.)
Would it be possible for you to add number formatting to your benchmark suite, as requested by @mahkoh above? (See also Formatting numbers is slow rust#19218 )

Though really, I don't think either of these modifications really needs to block this RFC. I'll try to ping @alexcrichton about getting it in front of the core team.

seanmonstar · 2015-02-06T00:12:29Z

@pnkfelix I added the drawback. Regarding numbers, I imagine a minor improvement, since this reduces the number of allocations for the buffer. However, it seems the slowness in formatting numbers has more to do with that implementation, no?

pnkfelix · 2015-02-06T00:49:17Z

@seanmonstar yeah probably

aturon · 2015-03-05T06:16:26Z

ping @alexcrichton

alexcrichton · 2015-03-10T02:11:29Z

Hm I've been thinking about this recently, and I'm somewhat worried about the code size implications here. For example the size_hint on an iterator is almost never a concern because it's rarely used in a trait object so we can eliminate those that are never used.

With formatting, however, trait objects are its bread and butter, so we're forced to keep all size_hint implementations as well as have a little extra code to move them into place during format_args!. Could you do some analysis about the code size implications here? Some specific questions I have are:

What's the impact on the format_args! macro for moving size_hint onto the stack and passing it?
What's the average impact for a crate that just uses #[derive(Debug)]? Presumably these size_hint implementations cannot be optimized away and it's quite a common trait to implement.

I don't think that this will turn up any showstoppers, but I'd like to have a handle on what we're getting into. Otherwise I think that this is basically good to go so I'll see what we can do to get it merged soon afterwards.

aturon · 2015-04-03T22:03:30Z

ping @seanmonstar @alexcrichton

rprichard · 2015-04-12T03:59:18Z

The RFC's current size_hint function does not know the formatting parameters, so presumably it must assume they are default. For a str, the width and precision can truncate or extend the output.

If there are format parameters, then the size_hint function for Arguments defined in the RFC can produce an incorrect hint, AFAICT. Either the lower or upper bound can be wrong. It can be fixed by returning (0, None) if Arguments::fmt is Some(..).

I wonder if size_hint should have a parameter giving it the format parameters. It could have &Formatter type or perhaps some new struct type.

I'm inclined to think the size hint should be Option<usize>, because as the RFC mentions, it's not clear how to use both bounds.

pczarn · 2015-04-12T10:40:25Z

I tried to sum size hints in the expansion of format_args!. Most libraries were bloated by a few %. I think doing it with dynamic dispatch could be a slowdown with less bloat. Implementing size_hint for Arguments is not worth it either way. Take a look at rust-lang/rust#16544.

aturon · 2015-04-15T02:30:55Z

ping @alexcrichton

alexcrichton · 2015-04-15T18:30:37Z

@seanmonstar do you have any updates on the impact on code size here?

seanmonstar · 2015-04-15T19:57:40Z

I do not. The branch I started is several months old, and I haven't had time to rebase. @rprichard raises some points worth considering, though.

Gankra · 2015-05-27T22:26:00Z

Note: this RFC entered its Final Comment Period as of Yesterday; 6 days remain before a final decision will be made.

bluss · 2015-05-30T09:40:19Z

Don't make the decision until the discussion about size_hint deprecation for iterators is done. It would be silly to tag along on a deprecated name, if it goes that way.

Gankra · 2015-05-30T19:43:21Z

Accepting this RFC does not require setting the name in stone. Though we all love to bikeshed names, I think we can all agree they're not really the most important part of this proposal.

Implementing this of course takes time, and landing it in stable even more.

seanmonstar · 2015-06-01T18:49:09Z

Another alternative would be instead of including this method, for fmt to use a thread-local LruCache of sizes when formatting types. This would result in the first case not getting the speed benefit, but would mean that all implementations of a fmt trait can use a size hint, without requiring the size_hint method to be kept in sync with the implementation.

alexcrichton · 2015-06-02T21:30:35Z

Having now gone through the final comment period, it's time to make a decision on this RFC. For now I'm going to close this RFC for the following reasons:

It's not clear what impact this has on large programs
Code size is a big worry here, but there are currently no statistics and/or analysis for this RFC.
There are lingering questions about size_hint on iterators in terms of naming.
It's somewhat unclear whether this API should diverge from the Iterator::size_hint API in terms of return values

Overall it seems best to start this RFC fresh again with a new look beyond microbenchmarks to the statistics measured here, and also perhaps await the outcome of #1034. Thanks regardless for the RFC @seanmonstar!

bluss · 2015-06-02T22:01:40Z

like gankro said, the name is a non-issue.

Add fmt size_hints

53b4ae0

lilyball reviewed Jan 14, 2015
View reviewed changes

alexcrichton reviewed Jan 14, 2015
View reviewed changes

seanmonstar added 2 commits January 15, 2015 10:28

update Add implementation to handle overflows

8407126

add details about SizeHint, implementation inside std::fmt::format, a…

5e0ad26

…nd an alternative

alexcrichton reviewed Jan 16, 2015
View reviewed changes

add unstable attributes to SizeHint and size_hint

c93d35c

seanmonstar reviewed Jan 17, 2015
View reviewed changes

nrc assigned pnkfelix Jan 22, 2015

alexcrichton mentioned this pull request Jan 28, 2015

std: Stabilize the std::fmt module rust-lang/rust#21713

Merged

seanmonstar mentioned this pull request Feb 4, 2015

RFC: Deprecate std::fmt::format in favor of String::format #810

Closed

add drawback regarding keeping methods in sync

732de9c

aturon added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label May 22, 2015

nikomatsakis added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label May 26, 2015

alexcrichton closed this Jun 2, 2015

alexcrichton mentioned this pull request Jun 2, 2015

Investigate size hints for formatting #1145

Open

Add fmt size_hints #583

Add fmt size_hints #583

Conversation

seanmonstar commented Jan 14, 2015

erickt commented Jan 14, 2015

lilyball commented Jan 14, 2015

lilyball commented Jan 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilyball commented Jan 14, 2015

seanmonstar commented Jan 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilyball commented Jan 15, 2015

seanmonstar commented Jan 15, 2015

mahkoh commented Jan 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Jan 16, 2015

reem commented Jan 16, 2015

seanmonstar commented Jan 16, 2015

seanmonstar commented Jan 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pnkfelix commented Feb 5, 2015

seanmonstar commented Feb 6, 2015

pnkfelix commented Feb 6, 2015

aturon commented Mar 5, 2015

alexcrichton commented Mar 10, 2015

aturon commented Apr 3, 2015

rprichard commented Apr 12, 2015

pczarn commented Apr 12, 2015

aturon commented Apr 15, 2015

alexcrichton commented Apr 15, 2015

seanmonstar commented Apr 15, 2015

Gankra commented May 27, 2015

bluss commented May 30, 2015

Gankra commented May 30, 2015

seanmonstar commented Jun 1, 2015

alexcrichton commented Jun 2, 2015

bluss commented Jun 2, 2015