XML Files with a Byte Order Mark should work #165
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
I too was affected by bug #155 - this bug means that when xml-rs attempts to parse any XML with a Byte Order Mark at the start, it fails and errors out, even though this is valid UTF-8 XML.
This PR is my attempt to fix the issue. If it is not any good, let me know how to improve the implementation.
I fixed it by simply having it check for the BOM bytes in the edge case where it finds non-whitespace characters before the root tag, in outside_tag.rs.
I have also added a .gitattributes file because I found tests fail on windows machines, as they expect unix line endings.
Finally I changed the fourth sample XML so that it has a UTF-8 bom mark at start, this constitutes the test.
I did not consider any other BOM marks (eg UTF-16 ones etc) because xml-rs only cares about utf-8.
Thank you!