-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Use Encoding.Preamble in StreamReader/Writer #23321
Conversation
if (preamble.Length > 0) | ||
{ | ||
_stream.Write(preamble, 0, preamble.Length); | ||
_stream.Write(preamble); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this is just a temporary issue, but I've/we've not yet overridden the span-based methods on FileStream, so temporarily this won't be as efficient as it could be. Not something to block on, though.
@@ -39,7 +32,6 @@ public class StreamReader : TextReader | |||
private Decoder _decoder; | |||
private byte[] _byteBuffer; | |||
private char[] _charBuffer; | |||
private byte[] _preamble; // Encoding's preamble, which identifies this encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the fact that this was being cached mean it's used multiple times? I've not reviewed all of the StreamReader code, but it looks like we might now be accessing Encoding.Preamble multiple times. In theory, could some custom encoding have a preamble and this change could now cause multiple calls to GetPreamble (via Preamble) where previously there was only one? It's probably such a remote case that it's not worth caring about, but just wanted to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from its use in some Debug.Asserts
(which we shouldn't care about), it is used in two places: Init
and IsPreamble
, and IsPreamble
can be called multiple times. So it's possible that a custom encoding that hasn't provided an efficient implementation of Preamble
could result in more allocations now. I lean towards not caring about it. The upside of this change is that all the built-in encodings won't allocate now (where previously only Encoding.UTF8
avoided the allocation). I'm guessing the built-in encodings are going to be used the vast majority of the time vs. a custom encoding. I'm open to other ideas, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with whatever @stephentoub thinks on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably fine. I'll sleep on it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be possible to remove the call in Init
, leaving a single call in IsPreamble
(which could still be called multiple times, but I suspect multiple calls are rare). I'll play around with it tomorrow and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that the 99.999% case will be the built-in encodings, and for those this change makes sense. Let's do it.
Thanks for making all of these Preamble changes :) |
Thank you for driving all the |
Use the new
Encoding.Preamble
property to avoid unnecessarybyte[]
allocations for encodings that return cached instances fromPreamble
(e.g. all of the built-in encodings that have preambles).Reference: dotnet/coreclr#13269
cc: @JeremyKuhne, @ianhays, @stephentoub