Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Use Encoding.Preamble in StreamReader/Writer #23321

Merged
merged 1 commit into from
Aug 17, 2017

Conversation

justinvp
Copy link
Contributor

Use the new Encoding.Preamble property to avoid unnecessary byte[] allocations for encodings that return cached instances from Preamble (e.g. all of the built-in encodings that have preambles).

Reference: dotnet/coreclr#13269

cc: @JeremyKuhne, @ianhays, @stephentoub

if (preamble.Length > 0)
{
_stream.Write(preamble, 0, preamble.Length);
_stream.Write(preamble);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is just a temporary issue, but I've/we've not yet overridden the span-based methods on FileStream, so temporarily this won't be as efficient as it could be. Not something to block on, though.

@@ -39,7 +32,6 @@ public class StreamReader : TextReader
private Decoder _decoder;
private byte[] _byteBuffer;
private char[] _charBuffer;
private byte[] _preamble; // Encoding's preamble, which identifies this encoding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the fact that this was being cached mean it's used multiple times? I've not reviewed all of the StreamReader code, but it looks like we might now be accessing Encoding.Preamble multiple times. In theory, could some custom encoding have a preamble and this change could now cause multiple calls to GetPreamble (via Preamble) where previously there was only one? It's probably such a remote case that it's not worth caring about, but just wanted to check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from its use in some Debug.Asserts (which we shouldn't care about), it is used in two places: Init and IsPreamble, and IsPreamble can be called multiple times. So it's possible that a custom encoding that hasn't provided an efficient implementation of Preamble could result in more allocations now. I lean towards not caring about it. The upside of this change is that all the built-in encodings won't allocate now (where previously only Encoding.UTF8 avoided the allocation). I'm guessing the built-in encodings are going to be used the vast majority of the time vs. a custom encoding. I'm open to other ideas, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with whatever @stephentoub thinks on this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably fine. I'll sleep on it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be possible to remove the call in Init, leaving a single call in IsPreamble (which could still be called multiple times, but I suspect multiple calls are rare). I'll play around with it tomorrow and see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that the 99.999% case will be the built-in encodings, and for those this change makes sense. Let's do it.

@stephentoub
Copy link
Member

Thanks for making all of these Preamble changes :)

@justinvp
Copy link
Contributor Author

Thank you for driving all the Span<T> API additions throughout :)

@stephentoub stephentoub merged commit e0c1f20 into dotnet:master Aug 17, 2017
@justinvp justinvp deleted the preamble branch August 17, 2017 17:23
@karelz karelz modified the milestone: 2.1.0 Aug 20, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants