Skip to content
This repository has been archived by the owner on Jul 5, 2023. It is now read-only.

Issue with string length using Latin-1 encoding #65

Closed
Lucas-C opened this issue Nov 26, 2018 · 3 comments
Closed

Issue with string length using Latin-1 encoding #65

Lucas-C opened this issue Nov 26, 2018 · 3 comments

Comments

@Lucas-C
Copy link

Lucas-C commented Nov 26, 2018

Hello.

I'm working on a bug affecting pylint, which uses astroid, which uses typed_ast:
pylint-dev/pylint#2610

What troubles me is the .col_offset value of a Str node containing a non-ASCII character with Latin1 encoding.

Here is a minimal code example:

# coding: latin_1
from typed_ast import ast3
print(ast3.parse("'a'+'A'").body[0].value.right.col_offset)  # print: 4
print(ast3.parse("'à'+'A'").body[0].value.right.col_offset)  # print: 5

From my understanding a "é" character as a length of 1 in Latin-1:
https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout

Hence, could someone help me understanding why the .col_offset is higher in that case, please ?

@gvanrossum
Copy link
Member

What does CPython's ast module give in this case? (Preferably fro Python 3.6, since that's what typed_ast is currently derived from -- we're trying to find time to work on Python 3.7 support.) If the same problem occurs with CPython's ast module, the problem is there and should be fixed there before we can fix it here.

@emmatyping
Copy link
Contributor

I just checked on 3.7 and it seems that typed_ast has the same behavior as the ast module, so I presume it is the same as for 3.6 (thus this seems to an issue for upstream).

@gvanrossum
Copy link
Member

OK, then let's close this as won't fix -- but feel free to report this in the CPython bug tracker (bugs.python.org).

tbbharaj pushed a commit to tbbharaj/typed_ast that referenced this issue Dec 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants