Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise use_a_generator for sum(), max(), min() #6595

Merged
merged 4 commits into from
May 13, 2022

Conversation

matusvalo
Copy link
Collaborator

Type of Changes

Type
🐛 Bug fix
✨ New feature
🔨 Refactoring
📜 Docs

Description

Based on https://peps.python.org/pep-0289/ also sum(), max(), min() should avoid using
lists. Using generator, the memory is not allocated for all items but the items are processed lazily.

@Pierre-Sassoulas Pierre-Sassoulas added the Enhancement ✨ Improvement to a component label May 12, 2022
@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented May 12, 2022

The benchmark result could be counterintuitive for small sum/max/min, see #3309 (comment). Depending on the results of a benchmark we'd need to use 'consider-using-generator' or 'use-a-generator'.

@coveralls
Copy link

coveralls commented May 12, 2022

Pull Request Test Coverage Report for Build 2316079191

  • 2 of 2 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 95.344%

Totals Coverage Status
Change from base Build 2315870463: 0.0%
Covered Lines: 16035
Relevant Lines: 16818

💛 - Coveralls

@matusvalo
Copy link
Collaborator Author

matusvalo commented May 12, 2022

The benchmark result could be counterintuitive for small sum/max/min, see #3309 (comment). Depending on the results of a benchmark we'd need to use 'consider-using-generator' or 'use-a-generator'.

I will do benchmarking. Just a comment about it - I think it is not 100% truth. In attached PEP using generator is recommended. Doing microbenchmarks can be misleading - e.g. in python 3.9 generator version can be slower but in 3.11 it can be opposite:

Python 3.11 is up to 10-60% faster than Python 3.10. [1]

I think we should follow best practises instead of tightly following microbenchmarks.

[1] https://docs.python.org/3.11/whatsnew/3.11.html#summary-release-highlights

@matusvalo
Copy link
Collaborator Author

python -m timeit "b = sum([x for x in range(10000)]);"
# 500 loops, best of 5: 503 usec per loop
python -m timeit "b = sum(x for x in range(10000));"
# 500 loops, best of 5: 466 usec per loop


python -m timeit "b = max([x for x in range(10000)]);"
# 500 loops, best of 5: 574 usec per loop
python -m timeit "b = max(x for x in range(10000));"
# 500 loops, best of 5: 546 usec per loop

python -m timeit "b = min([x for x in range(10000)]);"
# 500 loops, best of 5: 569 usec per loop
python -m timeit "b = min(x for x in range(10000));"
# 500 loops, best of 5: 535 usec per loop

Co-authored-by: Daniël van Noord <[email protected]>
@matusvalo matusvalo merged commit dc18f82 into pylint-dev:main May 13, 2022
@matusvalo matusvalo deleted the use-a-generator-other-functions branch May 13, 2022 05:46
@Pierre-Sassoulas
Copy link
Member

I think we should follow best practises instead of tightly following microbenchmarks.

I agree with the sentiment of the benchmark not being perfect. In fact, that's my bad for suggesting it, the decision for any/all vs list/set/dict was not taken based on the benchmark but based on the theory that a generator for all/any will be cutting the execution tree in most case so it will be faster, sometime by a lot.

I think max, min and sum will have to evaluate all elements whatever happens, so the benefit of a generator in term of performance will happens for very long containers only. We do have a message for best practises (consider-...), the wording is just stronger (use-...) if we have absolute certainty that the performance will be better/ i.e. all or any will be better whatever the interpreter or the size of the container. If a benchmark depending on the interpreter, the environment or the problem being solved is required the user need to consider the issue because we don't know for sure if the user favor best practices or raw performance.

@Pierre-Sassoulas
Copy link
Member

I think it should be consider-using-a-generator instead, for small list the performance are worst:

python -m timeit "b = sum([x for x in range(5)]);" # 329 nsec per loop
python -m timeit "b = sum(x for x in range(5));"   #369 nsec per loop

python -m timeit "b = max([x for x in range(5)]);" # 378 nsec per loop
python -m timeit "b = max(x for x in range(5));" # 425 nsec per loop

python -m timeit "b = min([x for x in range(5)]);" # 379 nsec per loop
python -m timeit "b = min(x for x in range(5));" # 417 nsec per loop

@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.14.0 milestone May 13, 2022
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this pull request May 13, 2022
DanielNoord added a commit that referenced this pull request May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants