-
-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in 'standard' lexer: the longest token is matched incorrectly #370
Comments
Exactly. The only real fix would be to implement regexp analysis, which isn't worth the trouble. |
Then I have two questuons:
|
kodo-pp said; Then I have two questions: 2.Won't the following be the fix? For each option we compute the length of the longest matching token (in the case I described long_operator would have zero-length match while + would have a match with length 1) and then use the longest token You see something in the grammar I don't see. "long_operator" just looks like a single lexer token that is expressed in 13 characters. Is it the name of function/procedure or a non-terminal I don't see ? |
+= is not separated. The funny thing is that is I remove About |
I'm attaching the notebook I was talking about. It can be downloaded here |
I should also mention that the solution to your problem is very simple: You can just split the terminal into two. |
|
Sounds to me like the docs are spot on. Can you explain why you think they are incorrect? |
@erezsh, Maybe because there are no regexps in my code? As you can see, all matches are string literals. The rule you quoted says about the longest theoretical match only for regexps, but not for string literals. Despite this I understand your point. I agree this is not a bug but rather a not-so-obvious caveat. Maybe this should be stated in the docs more clear. So this issue may be closed if you want to. |
Incorrect. If you want to group different terminals together, that's what rules are for. If you think you can provide a better explanation, feel free to write one, and I might include it in the docs. |
OK, now I see I was wrong. Thank you for your explanation |
When I use the
standard
lexer (default forlalr
parser) sometimes it determines the longest match incorrectly.Consider the following code:
It should print
[Token(ASSIGNMENT_OP, '+=')]
because the longest match is+=
and the documentation says that in this case the longest match should be returned. However, the code prints[Token(BINOP, '+'), Token(ASSIGNMENT_OP, '=')]
, which is incorrect.I suppose the lexer might be comparing not the lengths of actually matching terminals (
+
and+=
in this case) but the lengths of the longest possible options (long_operator
and+=
)P.S. lark version is 0.7.0, installed as a package
python-lark-parser
in ArchLinuxThe text was updated successfully, but these errors were encountered: