-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out if anything is needed for better HTML integration #128
Comments
I think I'd personally be okay if the standard just said that you had to wait for 1024 bytes before decoding and if you could optimize around that, it would be okay too. The difference should only be observable performance-wise, which seems acceptable. And we can encourage implementations to do the fast thing. I think that remains true if we add encoding sniffing. Rewriting the specifications to have the proper abstractions would be somewhat nicer obviously, but seems like a lot more effort. Note that we still have to change "decode" to also return the chosen encoding to the caller (and adjust any callers as appropriate). |
I'm reopening the discussion about this feature in whatwg/html#1077 (comment) |
This change moves the BOM splitting part of the decode hook into a separate hook which does not consume any bytes of the token stream. This will allow fixing a long-standing issue in the HTML encoding sniffing algorithm with the document's character encoding being set to the wrong result when there is a BOM: whatwg/html#1077. Closes #128.
@hsivonen in whatwg/html#1077 (comment) raised a number of issues with the current integration points. They are insufficient for CSS, HTML, and presumably XML.
This might require some substantive changes to the hooks and perhaps other parts of the Encoding Standard, as well as standards that depend on the Encoding Standard (of which there are quite a few, so tread carefully).
Belated filing this to keep better track of it.
The text was updated successfully, but these errors were encountered: