-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding implicit-str-concat-in-sequence check #1655
Conversation
Hello, Thank you for you PR! Can we have some test case along lines of: def my_func(arg, arg2, msg):
pass
my_func("s1", "s2", "really long message that I'd like to "
"split across multiple lines") Case above (IMHO) shouldn't emit a warning but it looks like it would be emitted. |
Well spotted, thanks |
Ok, so i'm facing a dead-end here :( First, this problem has a bit of history:
Now, I see 2 approaches in
The only idea I've got would be to retrieve from the source the code snippet corresponding to each Did any checker had to implement such thing already ? I.e retrieving source code from Or does anyone has a better idea ? |
I'm not sure if pylint gives you an option to but one way would be to cross reference the ast nodes with the tokenized output -- this is roughly how pyupgrade finds list / tuple nodes. Otherwise disregard me, just drive by browsing issues (while I wait for pylint to clone) and noticed a familiar name :) |
Hey, thanks for the advice @asottile ! I contemplated this idea, but it's awesome to see you already implemented it quite concisely: Before implementing something like that in
|
@Lucas-C You'll need a checker that inherits from
|
Thanks @rogalski ! I updated the PR, I think I nailed it ;) |
@@ -0,0 +1,19 @@ | |||
#pylint: disable=bad-continuation,invalid-name,missing-docstring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your tests does not pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pylint/checkers/strings.py
Outdated
@@ -541,15 +562,21 @@ class StringConstantChecker(BaseTokenChecker): | |||
# Unicode strings. | |||
UNICODE_ESCAPE_CHARACTERS = 'uUN' | |||
|
|||
string_tokens = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this really should be instance attribute, no class attribute.
We also need to be sure that data gets thrown away before switching to new module for analysis (simple reinitialization of dict on the beginning of process tokens should work fine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it manually, it is thrown away.
I'm going to try to make this an instance attribute, but it's bit annoying that you precisely recommended a class attribute in one of your earlier comment :(
Do you want to add a unit test ?
If so, could you please then point me to an existing test I could use as an example to perform such "reinitialization" ?
pylint/checkers/strings.py
Outdated
# This check relies on the asumption that StringConstantChecker is run beforehand | ||
for elt in iterable_node.elts: | ||
if isinstance(elt, astroid.node_classes.Const) and elt.pytype() == 'builtins.str': | ||
if StringConstantChecker.string_tokens[(elt.lineno, elt.col_offset)] != elt.value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, checker classes really should be independent, consider doing your process tokens inside StringFormatChecker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by introducing IAstroidAndTokensChecker
interface
Tests are failing with Python 2 interpreter. |
I fixed this hook for Python 2 |
I don't quite get why the |
pylint/checkers/strings.py
Outdated
@@ -545,11 +588,11 @@ def process_module(self, module): | |||
self._unicode_literals = 'unicode_literals' in module.future_imports | |||
|
|||
def process_tokens(self, tokens): | |||
for (tok_type, token, (start_row, _), _, _) in tokens: | |||
for (tok_type, token, start, _, _) in tokens: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change is redundant
if tok_type == tokenize.STRING: | ||
# 'token' is the whole un-parsed token; we can look at the start | ||
# of it to see whether it's a raw or unicode string etc. | ||
self.process_string_token(token, start_row) | ||
self.process_string_token(token, start[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change is redundant
pylint/checkers/strings.py
Outdated
@@ -527,7 +570,7 @@ class StringConstantChecker(BaseTokenChecker): | |||
'String constant might be missing an r or u prefix.', | |||
'anomalous-unicode-escape-in-string', | |||
'Used when an escape like \\u is encountered in a byte ' | |||
'string where it has no effect.'), | |||
'string where it has no effect.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change is redundant
pylint/interfaces.py
Outdated
@@ -79,6 +79,18 @@ class IAstroidChecker(IChecker): | |||
""" | |||
|
|||
|
|||
class IAstroidAndTokensChecker(IAstroidChecker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't BaseTokenChecker
exact same thing? From what I see BaseTokenChecker
is a checker, which means it can involve walking ASTs and visiting nodes... Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a side effect however:
making StringFormatChecker
inherit from BaseTokenChecker
and implement ITokenChecker
forces to change all the .add_message
call sites in this class, because it now requires a line
parameter:
https://github.com/PyCQA/pylint/blob/master/pylint/utils.py#L374
https://github.com/PyCQA/pylint/blob/master/pylint/utils.py#L153
pylint/lint.py
Outdated
@@ -844,10 +844,12 @@ def _do_check(self, files_or_modules): | |||
and c is not self] | |||
rawcheckers = [c for c in _checkers | |||
if interfaces.implements(c, interfaces.IRawChecker)] | |||
astcheckers = [c for c in _checkers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If previous comment is true, this won't be needed.
pylint/lint.py
Outdated
# notify global begin | ||
for checker in _checkers: | ||
checker.open() | ||
if interfaces.implements(checker, interfaces.IAstroidChecker): | ||
if checker in astcheckers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If previous comment is true, this won't be needed.
pylint/lint.py
Outdated
@@ -868,7 +870,7 @@ def _do_check(self, files_or_modules): | |||
# fix the current file (if the source file was not available or | |||
# if it's actually a c extension) | |||
self.current_file = ast_node.file # pylint: disable=maybe-no-member | |||
self.check_astroid_module(ast_node, walker, rawcheckers, tokencheckers) | |||
self.check_astroid_module(ast_node, walker, rawcheckers, tokencheckers, astcheckers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If previous comment is true, this won't be needed.
pylint/lint.py
Outdated
@@ -922,7 +924,7 @@ def get_ast(self, filepath, modname): | |||
self.add_message('astroid-error', args=(ex.__class__, ex)) | |||
|
|||
def check_astroid_module(self, ast_node, walker, | |||
rawcheckers, tokencheckers): | |||
rawcheckers, tokencheckers, astcheckers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If previous comment is true, this won't be needed.
pylint/lint.py
Outdated
@@ -946,6 +948,9 @@ def check_astroid_module(self, ast_node, walker, | |||
checker.process_module(ast_node) | |||
for checker in tokencheckers: | |||
checker.process_tokens(tokens) | |||
for checker in astcheckers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If previous comment is true, this won't be needed.
tox.ini
Outdated
@@ -7,7 +7,7 @@ deps = | |||
git+https://github.com/pycqa/astroid@master | |||
isort | |||
|
|||
commands = pylint -rn --rcfile={toxinidir}/pylintrc --load-plugins=pylint.extensions.docparams, pylint.extensions.mccabe {envsitepackagesdir}/pylint | |||
commands = pylint -rn --rcfile={toxinidir}/pylintrc --load-plugins=pylint.extensions.docparams,pylint.extensions.mccabe {envsitepackagesdir}/pylint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if lack of previous warnings was caused by this bug in tox.ini, please revert it for now. We can tackle this problem in another commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you want
3 failing tests remain:
All those failing tests are a consequence of the modified call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if anything requires further clarification.
pylint/utils.py
Outdated
@@ -359,7 +359,7 @@ def add_message(self, msg_descr, line=None, node=None, args=None, confidence=UND | |||
|
|||
If provided, the message string is expanded using args | |||
|
|||
AST checkers should must the node argument (but may optionally | |||
AST checkers should provide the node argument (but may optionally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that you modified existing add_message
calls due to this documentation entry. From your case it's clear that it may be misleading.
I'd suggest an entry:
All emitted messages shall provide at minimum either line in which violation occurred or AST node in which violation occurred. Passing both values is also accepted.
Feel free to rephrase if you have better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no, it doesn't work like this :(
If the checker implements IRawChecker
or ITokenChecker
, a line
parameter must be passed in .add_message
, else there must be a node
passed:
https://github.com/PyCQA/pylint/blob/master/pylint/utils.py#L153
https://github.com/PyCQA/pylint/blob/master/pylint/utils.py#L374
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I did not picked that up when I did review yesterday.
I'm not familiar with this part of the repository, so I cannot give a straightforward answer right now. I'd suggest extracting a new class for your check and wrap everything up so all tests pass (without modifying preexisting test suite). Then we can wait for @PCManticore input on how to tackle this hybrid AST/token checker in terms of emitting messages from them and documenting those checkers in our docs.
I think this is ready to be merged |
Some changes addressed, some changes needs further discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of things I'm not convinced off. I'd like to see the new message disabled in pylintrc in order to see what exactly we have in our codebase. I do hope that this won't include multiline strings such as those from our message definitions
doc/development_guide/contribute.rst
Outdated
@@ -166,6 +166,6 @@ A Typical Development Workflow | |||
It is also possible to give tox a `pytest specifier <https://docs.pytest.org/en/latest/usage.html#specifying-tests-selecting-tests>`_ | |||
to run only your test:: | |||
|
|||
$ tox pylint/test/test_functional.py::test_functional | |||
$ tox -e py36 -- -k 'test_functional[str_literals_concatenation_and_comma]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why changing this?
doc/development_guide/contribute.rst
Outdated
@@ -166,6 +166,6 @@ A Typical Development Workflow | |||
It is also possible to give tox a `pytest specifier <https://docs.pytest.org/en/latest/usage.html#specifying-tests-selecting-tests>`_ | |||
to run only your test:: | |||
|
|||
$ tox pylint/test/test_functional.py::test_functional | |||
$ tox -e py36 -- -k 'test_functional[implicit_str_concat_in_sequence]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you modify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help future contributors by showing more useful CLI flags:
- the ability to execute the tests in only one tox environment
- the ability to execute only one functional test
pylint/checkers/strings.py
Outdated
self.process_string_token(token, start_row) | ||
self.process_string_token(token, start[0]) | ||
try: | ||
self.string_tokens[start] = ast.literal_eval(token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of literal_evaling these tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get the string value of a token, which is itself a string variable.
There are some examples of token
values:
"'a'"
"u'b'"
"r'b'"
"'b \\\n c'"
pylintrc
Outdated
@@ -52,7 +52,8 @@ confidence= | |||
# no Warning level messages displayed, use"--disable=all --enable=classes | |||
# --disable=W" | |||
|
|||
disable=invalid-name,protected-access,fixme,too-many-branches, | |||
disable=implicit-str-concat-in-sequence, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where exactly do we have this occurring in our code base? It would be nice to see them before merging, since this usually can reveal false positives and whatnot, things we should account for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 220 violations :(
And yes, this is because of the way warning message are defined.
For example there is the line where the 1st warning is raised:
https://github.com/PyCQA/pylint/blob/master/pylint/extensions/mccabe.py#L131
Interestingly, it is not consistent, and sometimes the newline is escaped:
https://github.com/PyCQA/pylint/blob/master/pylint/checkers/exceptions.py#L62
What do you advise ?
Replace all string concatenations by escaped newlines ? :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was afraid this is going to happen with this PR, it will emit even for use cases that are valid, such as the ones from pylint's message definition. Given this, I'm finding the new check too nitpicky. If we were to include this in pylint, then we should restrict it to verify just the implicit string concatenation that occurs on the same line, not on multiple lines.
|
||
def check_for_concatenated_strings(self, iterable_node, iterable_type): | ||
for elt in iterable_node.elts: | ||
if isinstance(elt, Const) and elt.pytype() in _AST_NODE_STR_TYPES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably check the type of elt.value
instead of using pytype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd member of this conditional serves to distinguish between e.g. string constants and integer constants.
Are you suggesting to do something like the following code ?
if isinstance(elt, Const) and isinstance(elt.value, six.string_types):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lucas-C left a comment. I think this will be too nitpicky since it picks on valid use cases, such as implicit joining on multiple lines, which will generate a ton of errors. This PR should restrict the check only for string concatenation on the same line.
tl;dr : I disagree but I made the required changes While I totally agree with the checker beeing nickpicky, I think ignoring string concatenations over newlines also restrict the use of this warning. For example, no warning will be raised for this code:
Whether or not it appears in |
I rebased this PR on the master branch. Are you OK to merge it ? |
@Lucas-C Yes, just didn't get to it just yet |
I like what @Lucas-C said here: #1655 (comment). It would be useful to also include a check for accidental multiline string concatenations in sequences. Perhaps this could be included but default disabled? |
I juste rebased this PR so it is not conflicting to merge it. |
1 similar comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Lucas-C Sorry for the extreme long wait! Thanks for rebasing this PR. The only change that I'd like is the use of ast.literal_eval
. Other than that, this PR looks fine. Don't forget the usual boilerplate, an entry in Contributors and a What's New entry in 2.0
rst.
This should be good |
I don't understand the last error in the Travis CI build:
|
I'd also be interested in getting this PR merged, since I have gotten bitten by this problem before. Thanks for the good work @Lucas-C . |
One year since the first version of this PR... :) |
@Lucas-C Thank you! Thank you for waiting over an year to get this merged, it's mostly my fault on not getting to it sooner. |
No need to blame yourself, there's no problem. |
I'm interested in getting this warning when implicitly concatenating across lines. If I'm reading the thread correctly, that was implemented but changed. Is there a simple snippet I can use to get that behavior, perhaps in a plugin? |
You should get a warning with this code and pylint 2.2.2:
Example:
|
Right, thanks. """Pylint 2.2 should warn implicit-str-concat-in-sequence.
https://github.com/PyCQA/pylint/pull/1655
pylint 2.2.2
astroid 2.1.0
Python 3.6.7
[GCC 8.2.0]
Current result:
names.py:13:0: W1403: Implicit string concatenation found in list (implicit-str-concat-in-sequence)
"""
X = ["a", "b" "c", "d"]
Y = ["a",
"b"
"c",
"d"
]
print(len(X))
print(len(Y)) I'm interested in getting a warning for the multi-line case, |
Yes, your deductions are totally right. |
FWIW, I would be in favour of |
Thanks for your input @PCManticore |
@Lucas-C There's no need for a new check, as long as this one can look for that flag when deciding for multiline vs same line behaviour, that should be enough. Here's some rough instructions:
Let me know if that makes sense, happy to discuss in a separate issue. |
Thank you for those clear guidelines. |
Fixes / new features
Message: