Raise use_a_generator for `sum()`, `max()`, `min()` #6595

matusvalo · 2022-05-12T20:37:55Z

Type of Changes

	Type
	🐛 Bug fix
✓	✨ New feature
	🔨 Refactoring
	📜 Docs

Description

Based on https://peps.python.org/pep-0289/ also sum(), max(), min() should avoid using
lists. Using generator, the memory is not allocated for all items but the items are processed lazily.

Pierre-Sassoulas · 2022-05-12T20:47:44Z

The benchmark result could be counterintuitive for small sum/max/min, see #3309 (comment). Depending on the results of a benchmark we'd need to use 'consider-using-generator' or 'use-a-generator'.

coveralls · 2022-05-12T20:49:18Z

Pull Request Test Coverage Report for Build 2316079191

2 of 2 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 95.344%

Totals
Change from base Build 2315870463:	0.0%
Covered Lines:	16035
Relevant Lines:	16818

💛 - Coveralls

matusvalo · 2022-05-12T20:53:16Z

The benchmark result could be counterintuitive for small sum/max/min, see #3309 (comment). Depending on the results of a benchmark we'd need to use 'consider-using-generator' or 'use-a-generator'.

I will do benchmarking. Just a comment about it - I think it is not 100% truth. In attached PEP using generator is recommended. Doing microbenchmarks can be misleading - e.g. in python 3.9 generator version can be slower but in 3.11 it can be opposite:

Python 3.11 is up to 10-60% faster than Python 3.10. [1]

I think we should follow best practises instead of tightly following microbenchmarks.

[1] https://docs.python.org/3.11/whatsnew/3.11.html#summary-release-highlights

matusvalo · 2022-05-12T21:03:27Z

python -m timeit "b = sum([x for x in range(10000)]);"
# 500 loops, best of 5: 503 usec per loop
python -m timeit "b = sum(x for x in range(10000));"
# 500 loops, best of 5: 466 usec per loop


python -m timeit "b = max([x for x in range(10000)]);"
# 500 loops, best of 5: 574 usec per loop
python -m timeit "b = max(x for x in range(10000));"
# 500 loops, best of 5: 546 usec per loop

python -m timeit "b = min([x for x in range(10000)]);"
# 500 loops, best of 5: 569 usec per loop
python -m timeit "b = min(x for x in range(10000));"
# 500 loops, best of 5: 535 usec per loop

ChangeLog

Co-authored-by: Daniël van Noord <[email protected]>

Pierre-Sassoulas · 2022-05-13T06:34:11Z

I think we should follow best practises instead of tightly following microbenchmarks.

I agree with the sentiment of the benchmark not being perfect. In fact, that's my bad for suggesting it, the decision for any/all vs list/set/dict was not taken based on the benchmark but based on the theory that a generator for all/any will be cutting the execution tree in most case so it will be faster, sometime by a lot.

I think max, min and sum will have to evaluate all elements whatever happens, so the benefit of a generator in term of performance will happens for very long containers only. We do have a message for best practises (consider-...), the wording is just stronger (use-...) if we have absolute certainty that the performance will be better/ i.e. all or any will be better whatever the interpreter or the size of the container. If a benchmark depending on the interpreter, the environment or the problem being solved is required the user need to consider the issue because we don't know for sure if the user favor best practices or raw performance.

Pierre-Sassoulas · 2022-05-13T07:09:32Z

I think it should be consider-using-a-generator instead, for small list the performance are worst:

python -m timeit "b = sum([x for x in range(5)]);" # 329 nsec per loop
python -m timeit "b = sum(x for x in range(5));"   #369 nsec per loop

python -m timeit "b = max([x for x in range(5)]);" # 378 nsec per loop
python -m timeit "b = max(x for x in range(5));" # 425 nsec per loop

python -m timeit "b = min([x for x in range(5)]);" # 379 nsec per loop
python -m timeit "b = min(x for x in range(5));" # 417 nsec per loop

Follow-up to pylint-dev#6595

Follow-up to #6595 Co-authored-by: Daniël van Noord <[email protected]>

matusvalo added 2 commits May 12, 2022 22:35

Raise use_a_generator for sum(), max(), min()

516c306

Updated changelog

460115c

matusvalo mentioned this pull request May 12, 2022

Added use-a-generator message example #6590

Merged

Pierre-Sassoulas added the Enhancement ✨ Improvement to a component label May 12, 2022

and -> or

337a6f0

DanielNoord approved these changes May 12, 2022

View reviewed changes

ChangeLog Outdated Show resolved Hide resolved

Update ChangeLog

f3cc0d8

Co-authored-by: Daniël van Noord <[email protected]>

matusvalo merged commit dc18f82 into pylint-dev:main May 13, 2022

matusvalo deleted the use-a-generator-other-functions branch May 13, 2022 05:46

Pierre-Sassoulas added this to the 2.14.0 milestone May 13, 2022

Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this pull request May 13, 2022

Change wording of use a generator message for sum/max/min

5ee750f

Follow-up to pylint-dev#6595

Pierre-Sassoulas mentioned this pull request May 13, 2022

Change wording of use a generator message for sum/max/min #6600

Merged

DanielNoord added a commit that referenced this pull request May 13, 2022

Change wording of use a generator message for sum/max/min (#6600)

7d7b8e6

Follow-up to #6595 Co-authored-by: Daniël van Noord <[email protected]>

Pierre-Sassoulas mentioned this pull request Jul 4, 2022

use-a-generator hint in sum method #5166

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise use_a_generator for `sum()`, `max()`, `min()` #6595

Raise use_a_generator for `sum()`, `max()`, `min()` #6595

matusvalo commented May 12, 2022

Pierre-Sassoulas commented May 12, 2022 •

edited

Loading

coveralls commented May 12, 2022 •

edited

Loading

matusvalo commented May 12, 2022 •

edited

Loading

matusvalo commented May 12, 2022

Pierre-Sassoulas commented May 13, 2022

Pierre-Sassoulas commented May 13, 2022

Raise use_a_generator for sum(), max(), min() #6595

Raise use_a_generator for sum(), max(), min() #6595

Conversation

matusvalo commented May 12, 2022

Type of Changes

Description

Pierre-Sassoulas commented May 12, 2022 • edited Loading

coveralls commented May 12, 2022 • edited Loading

Pull Request Test Coverage Report for Build 2316079191

💛 - Coveralls

matusvalo commented May 12, 2022 • edited Loading

matusvalo commented May 12, 2022

Pierre-Sassoulas commented May 13, 2022

Pierre-Sassoulas commented May 13, 2022

Raise use_a_generator for `sum()`, `max()`, `min()` #6595

Raise use_a_generator for `sum()`, `max()`, `min()` #6595

Pierre-Sassoulas commented May 12, 2022 •

edited

Loading

coveralls commented May 12, 2022 •

edited

Loading

matusvalo commented May 12, 2022 •

edited

Loading