You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That's how the browser (chrome) is able to guess the right encoding and display the page with the right encoding. I work in a place that have to deal with a lot of different kinds of pages, and I can tell this is far from a rare case (especially in brazilian portuguese websites), so it would be nice to fix this in crawley.
So far I saw two solutions as proposed in this answer in SO, using chardet module or UnicodeDammit (from BeautifulSoup).
I've develop, locally, these two alternatives and tested them with PyQuery, seems to fix the problem.
I would like to hear your opinion on this issue and if you want, I can submit one of those solutions.
BTW, good work in building crawley, I'm having a very nice time using it! Hope I can contribute somehow.
The text was updated successfully, but these errors were encountered:
I'm using PyQuery, and I get wrong encode detection for this page:
http://www1.abracom.org.br/cms/opencms/abracom/pt/associados/resultado_busca.html?nomeArq=0148.html
The problem is that the html has this meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
But the page is actually
utf-8
I get this info from the response headers:
That's how the browser (chrome) is able to guess the right encoding and display the page with the right encoding. I work in a place that have to deal with a lot of different kinds of pages, and I can tell this is far from a rare case (especially in brazilian portuguese websites), so it would be nice to fix this in crawley.
So far I saw two solutions as proposed in this answer in SO, using
chardet
module orUnicodeDammit
(from BeautifulSoup).I've develop, locally, these two alternatives and tested them with PyQuery, seems to fix the problem.
I would like to hear your opinion on this issue and if you want, I can submit one of those solutions.
BTW, good work in building crawley, I'm having a very nice time using it! Hope I can contribute somehow.
The text was updated successfully, but these errors were encountered: