-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this crate aiming to parse HTML? #238
Comments
You can parse a lot of HTML (disabling end tag checks and using the |
I have created a PR at #239 |
You may run into problems with raw text elements, which may contain '<' characters in their content. You can work around that by deconstructing the An opt-in option to handle these cases could probably be added to Edit: |
I think there could be a case made for skipping over subtrees even in XML, purely on performance grounds. |
You have the |
Thanks for mentioning that method, I had missed it. My point regarding the "opt-in" was that something like: So here:
the entire content of the script tag would be treated as a I'm not sure if that would be something worth adding. It would be nice to have a version of So that using
|
This is definitely something worth adding, at least behind a feature flag |
I think with #208 it could be a single method that returns the slice, as constructing and discarding a slice is not too bad. I wonder if |
Just for context, I don't like breaking changes but I am definitely not against it as long as there is a good reason for that. Cargo makes handling breaking changes quite easy and this part is not a central part of the lib. |
fwiw quick-xml didn't work out particularly well in the end so I created my own html parser: https://github.com/untitaker/html5gum |
I've successfully used quick-xml to parse HTML, however I just noticed that quick-xml does not unescape things like
. So I am not entirely sure if quick-xml is generally supposed to handle HTML or only partially such that we can build our own parsers/unescapers on top.The text was updated successfully, but these errors were encountered: