Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fmt size_hints #583

Closed
wants to merge 5 commits into from
Closed

Conversation

seanmonstar
Copy link
Contributor

Add a size_hint method to each of the fmt traits, allowing a buffer to allocate with the correct size before writing.

Rendered

cc @alexcrichton

@erickt
Copy link

erickt commented Jan 14, 2015

I'm all for adding a hint for formatting. It would be nice in this RFC to mention unifying this with iterator size hints.

cc-ing @kballard and @thestinger, who I believe collaborated on adding Iterator::size_hint.

Does anyone know of anything using the Iterator::size_hint upper bound? I did a scan through the rust code base, and the only users I could find was other size_hint methods and tests. While semantically it'd be nice to have the lower and upper bounds of a collection, it's a bit of a shame we might be doing work that no one ever uses. Perhaps we should split things into a size_hint(&self) -> uint and a size_bounds(&self) -> (uint, Option<uint>)?

#46 is somewhat related to this RFC.

@lilyball
Copy link
Contributor

I don't know offhand of anyone using the Iterator::size_hint upper bound. I have a recollection a discussion a while back in which someone claimed that, after looking through the rust source, there were no users of it. At the time I defended its existence with theoretical use-cases that wouldn't be found in the standard libraries but might be found in third-party code, such as providing progress estimates for an operation that processes the results of an iterator chain. Or perhaps more applicable, there may be some use in having an Extend implementation that allocates more than the lower bound ahead of time, with the expectation that the "true" iterator count lies somewhere between the lower and upper bounds.

All that said, if the size_hint for fmt is just used for the allocation created by fmt itself, then if it's only using the lower bound, there doesn't seem to be much use in having the upper bound. Or is there a use-case for calling this size_hint in third-party code? Alternatively, would the fmt module perhaps want to experiment with pre-allocating some size in between the lower and upper bounds? I'm thinking some simple rule like allocating min(lower_bound * 1.5, upper_bound).

@lilyball
Copy link
Contributor

A drawback to this proposal: all manual implementations of fmt traits now have a second method to implement that must be kept in sync with the fmt() method.

Something to consider: A macro like write_size_hint!() that takes a format string and arguments, just like write!(), does, and generates a SizeHint from it. You'd need to keep the format string in sync with the write!() call but it would make implementing size_hint() easier in most cases.

SizeHint {
min: self.min + other.min,
max: match (self.max, other.max) {
(Some(left), Some(right)) => Some(left + right),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation doesn't handle overflow. It needs to look something like

SizeHint {
    min: self.min.saturating_add(other.min),
    max: if let (Some(left), Some(right)) = (self.max, other.max) {
        Some(left.checked_add(right))
    } else {
        None
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, will adjust.

@lilyball
Copy link
Contributor

Offhand, I am in favor of making formatting strings more efficient, especially if that makes "foo".to_string() more efficient.

@seanmonstar
Copy link
Contributor Author

That's exactly what I had thought with the range: if the range is large, it
might perform faster picking somewhere in the middle.

I looked at Iters size_hint for design, but they're a little different.
Iters hint changes depending on the position of the Iter. Also, fmt can
have nested calls, so those must be added.

@kballard perhaps debug_asserts could be added to ensure at the end that
the string is within the bounds, to catch when you forget to change both?

On Wed, Jan 14, 2015, 11:12 AM Kevin Ballard [email protected]
wrote:

A drawback to this proposal: all manual implementations of fmt traits now
have a second method to implement that must be kept in sync with the fmt()
method.

Something to consider: A macro like write_size_hint!() that takes a
format string and arguments, just like write!(), does, and generates a
SizeHint from it. You'd need to keep the format string in sync with the
write!() call but it would make implementing size_hint() easier in most
cases.


Reply to this email directly or view it on GitHub
#583 (comment).

}
```

Add a `SizeHint` type, with named properties, instead of using tuple indexing. Include an `Add` implementation for `SizeHint`, so they can be easily added together from nested properties.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting deviation from the size hints of iterators. I would personally expect the two to have the same signature, and this may want to at least discuss the discrepancy between the two return values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It started because I wanted to implement Add, and it felt odd to impl on such a generic tuple.

@lilyball
Copy link
Contributor

I've got a different approach that does not change the API of Show or String (or anything else that third-party code is expected to implement). It's focused around optimizing the case of "foo".to_string(). The approach here is to add a parameter to std::fmt::Writer::write_str() called more_coming that is true if the caller believes it will be calling write_str() again. The flag is tracked in the Formatter struct so calls to write!() inside an implementation of Show/String don't incorrectly claim there will be no writes if their caller still has more to write. Ultimately, the flag is used in the implementation of fmt::Writer on String to use reserve_exact() if nothing more is expected to be written.

With this approach I'm seeing similar numbers for your to_string() benchmarks that I get with your code (once I re-enable the call to shrink_to_fit() that you commented out in your code). The downside is the "nested" benchmark ends up about 3% slower with my code compared to the stdlib version, although I haven't tried to figure out why yet.

test test::bench_long          ... bench:        64 ns/iter (+/- 3)
test test::bench_long_memcpy   ... bench:        36 ns/iter (+/- 3)
test test::bench_long_stdlib   ... bench:       110 ns/iter (+/- 5)
test test::bench_med           ... bench:        53 ns/iter (+/- 3)
test test::bench_med_memcpy    ... bench:        27 ns/iter (+/- 2)
test test::bench_med_stdlib    ... bench:        96 ns/iter (+/- 5)
test test::bench_nested        ... bench:       211 ns/iter (+/- 11)
test test::bench_nested_stdlib ... bench:       204 ns/iter (+/- 21)
test test::bench_short         ... bench:        53 ns/iter (+/- 3)
test test::bench_short_memcpy  ... bench:        28 ns/iter (+/- 1)
test test::bench_short_stdlib  ... bench:        76 ns/iter (+/- 1)

@seanmonstar
Copy link
Contributor Author

@kballard @alexcrichton I've updated the RFC with your comments.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 16, 2015

Please benchmark number formatting. rust-lang/rust#19218

value: &'a Void,
formatter: fn(&Void, &mut Formatter) -> Result,
hint: fn(&Void) -> SizeHint,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth profiling the impact of this representation as it's inflating the size from 2 words to 3 words. Formatting is normally not necessarily on hot code paths, but it does affect code size and stack size and we've had to optimize it in the past. In theory an Argument is just a trait object so the formatter/hint pair could point to a "vtable" which could just be constructed manually instead of as a trait object itself.

Regardless though, it's a pretty minor point, just something to think about :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this as well. Currently, any function can be passed here, but if we didn't want that flexibility, we could create an enum Format { Show, String, LowerHex, ... } and change Argument to be struct Argument { formatter: Format, value: &Void }.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we do need to force rustc to monomorphize an implementation of Show as how would you actually call the fmt function given just formatter and value? (e.g. a function pointer needs to be somewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined being able to look them up from the enum:

impl Format {
    fn get_fns(&self) -> (fn(&Void, &mut Formatter) -> Result, fn(&Void) -> SizeHint) {
        match *self {
            Format::Show => (Show::fmt, Show::size_hint),
            // ...
        }
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need type inference to drive inference for the Self type of Show::fmt:

use std::fmt::Show;

fn main() {
    let f = Show::fmt;
}
foo.rs:4:13: 4:22 error: type annotations required: cannot resolve `_ : core::fmt::Show`
foo.rs:4     let f = Show::fmt;
                     ^~~~~~~~~
foo.rs:4:13: 4:22 note: required by `core::fmt::Show::fmt`
foo.rs:4     let f = Show::fmt;
                     ^~~~~~~~~
error: aborting due to previous error

@alexcrichton
Copy link
Member

Thanks for the additions @seanmonstar! Could you also add a part explicitly saying that everything will be #[unstable] to start out with? In theory that should allow us to immediately improve "foo".to_string() while preventing locking us in to SizeHint immediately as defining your own size_hint would be unstable by default.

@reem
Copy link

reem commented Jan 16, 2015

I've also wondered if we could just add length hinting methods to fmt::Formatter, like hint_size(&mut self, usize) which avoids changing the trait at all.

@seanmonstar
Copy link
Contributor Author

@reem By the time you can call that, the buffer has already been allocated. And if the object has properties that are also going to be formatted, those properties can't tell the Formatter their hint until part way through the formatting.

@seanmonstar
Copy link
Contributor Author

@alexcrichton I added unstable attributes to the struct and method.

```rust
impl<T: fmt::String> ToString for T {
fn to_string(&self) -> String {
format!("{}", self)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started implementing this, and ran into that the format macro is defined in libstd, and this trait is in libcollections. The reason it's in std is because it needs String to use as a buffer.

What if I moved this macro lib collections, and had it call String::format instead of std::fmt::format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to define the format! macro in libcollections and then use macro_reexport in the standard library (like it does for vec! already)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is exactly my plan, c.f. rust-lang/rust#21912 and #810.

@pnkfelix
Copy link
Member

pnkfelix commented Feb 5, 2015

@seanmonstar

  1. Could you add "all manual implementations of fmt traits now have a second method to implement that must be kept in sync with the fmt() method." to the Drawbacks section, as mentioned by @kballard above? (At least, as I understand it, your choice is either to inherit the default impl, which is always correct but maximally inefficient, or to maintain this second method.)
  2. Would it be possible for you to add number formatting to your benchmark suite, as requested by @mahkoh above? (See also Formatting numbers is slow rust#19218 )

Though really, I don't think either of these modifications really needs to block this RFC. I'll try to ping @alexcrichton about getting it in front of the core team.

@seanmonstar
Copy link
Contributor Author

@pnkfelix I added the drawback. Regarding numbers, I imagine a minor improvement, since this reduces the number of allocations for the buffer. However, it seems the slowness in formatting numbers has more to do with that implementation, no?

@pnkfelix
Copy link
Member

pnkfelix commented Feb 6, 2015

@seanmonstar yeah probably

@aturon
Copy link
Member

aturon commented Mar 5, 2015

ping @alexcrichton

@alexcrichton
Copy link
Member

Hm I've been thinking about this recently, and I'm somewhat worried about the code size implications here. For example the size_hint on an iterator is almost never a concern because it's rarely used in a trait object so we can eliminate those that are never used.

With formatting, however, trait objects are its bread and butter, so we're forced to keep all size_hint implementations as well as have a little extra code to move them into place during format_args!. Could you do some analysis about the code size implications here? Some specific questions I have are:

  • What's the impact on the format_args! macro for moving size_hint onto the stack and passing it?
  • What's the average impact for a crate that just uses #[derive(Debug)]? Presumably these size_hint implementations cannot be optimized away and it's quite a common trait to implement.

I don't think that this will turn up any showstoppers, but I'd like to have a handle on what we're getting into. Otherwise I think that this is basically good to go so I'll see what we can do to get it merged soon afterwards.

@aturon
Copy link
Member

aturon commented Apr 3, 2015

ping @seanmonstar @alexcrichton

@rprichard
Copy link

The RFC's current size_hint function does not know the formatting parameters, so presumably it must assume they are default. For a str, the width and precision can truncate or extend the output.

If there are format parameters, then the size_hint function for Arguments defined in the RFC can produce an incorrect hint, AFAICT. Either the lower or upper bound can be wrong. It can be fixed by returning (0, None) if Arguments::fmt is Some(..).

I wonder if size_hint should have a parameter giving it the format parameters. It could have &Formatter type or perhaps some new struct type.

I'm inclined to think the size hint should be Option<usize>, because as the RFC mentions, it's not clear how to use both bounds.

@pczarn
Copy link

pczarn commented Apr 12, 2015

I tried to sum size hints in the expansion of format_args!. Most libraries were bloated by a few %. I think doing it with dynamic dispatch could be a slowdown with less bloat. Implementing size_hint for Arguments is not worth it either way. Take a look at rust-lang/rust#16544.

@aturon
Copy link
Member

aturon commented Apr 15, 2015

ping @alexcrichton

@alexcrichton
Copy link
Member

@seanmonstar do you have any updates on the impact on code size here?

@seanmonstar
Copy link
Contributor Author

I do not. The branch I started is several months old, and I haven't had time to rebase. @rprichard raises some points worth considering, though.

@aturon aturon added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label May 22, 2015
@nikomatsakis nikomatsakis added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label May 26, 2015
@Gankra
Copy link
Contributor

Gankra commented May 27, 2015

Note: this RFC entered its Final Comment Period as of Yesterday; 6 days remain before a final decision will be made.

@bluss
Copy link
Member

bluss commented May 30, 2015

Don't make the decision until the discussion about size_hint deprecation for iterators is done. It would be silly to tag along on a deprecated name, if it goes that way.

@Gankra
Copy link
Contributor

Gankra commented May 30, 2015

Accepting this RFC does not require setting the name in stone. Though we all love to bikeshed names, I think we can all agree they're not really the most important part of this proposal.

Implementing this of course takes time, and landing it in stable even more.

@seanmonstar
Copy link
Contributor Author

Another alternative would be instead of including this method, for fmt to use a thread-local LruCache of sizes when formatting types. This would result in the first case not getting the speed benefit, but would mean that all implementations of a fmt trait can use a size hint, without requiring the size_hint method to be kept in sync with the implementation.

@alexcrichton
Copy link
Member

Having now gone through the final comment period, it's time to make a decision on this RFC. For now I'm going to close this RFC for the following reasons:

Overall it seems best to start this RFC fresh again with a new look beyond microbenchmarks to the statistics measured here, and also perhaps await the outcome of #1034. Thanks regardless for the RFC @seanmonstar!

@bluss
Copy link
Member

bluss commented Jun 2, 2015

like gankro said, the name is a non-issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.