pylint -j0 always uses 1 core on real hardware in Linux #7338

dsilakov · 2022-08-22T15:40:36Z

Bug description

-j0 should automatically detect number of processors available. However, after the changes that introduce _cpu_count() function for calculation available resources in clouds, this works incorrectly on real hardware.

A potential problem lies here: https://github.com/PyCQA/pylint/blob/main/pylint/lint/run.py#L59
If we reach this code (we don't have cpu.cfs_quota_us set) then we always divide /sys/fs/cgroup/cpu/cpu.shares by 1024. Maybe this is correct for AWS, but e.g. in ubuntu 20.04 cpu.shares is set to 1024 by default. So it doesn't matter how many cpu cores do you have, pylint -j0 will always use only one of them.

The whole idea of using cpu.shares in such a way looks strange, since it just declares an allowed "share" of CPU resources for the particular process.

Configuration

Ubuntu 20.04 x86_64, real hw, 12 cores

Command used

$ time pylint -j0 pylint/*
$ time pylint -j1 pylint/*
$ time pylint -j12 pylint/*

Pylint output

$ time pylint -j0 pylint/*
30 sec

$ time pylint -j1 pylint/*
30 sec

$ time pylint -j12 pylint/*
8 sec

Expected behavior

$ time pylint -j0 pylint/*
8 sec

$ time pylint -j1 pylint/*
30 sec

$ time pylint -j12 pylint/*
8 sec

Pylint version

pylint 2.14.5
astroid 2.11.7
Python 3.8.10

OS / Environment

Ubuntu 20.04 x86_64

Additional dependencies

No response

DanielNoord · 2022-08-23T07:47:56Z

@dsilakov I must say I'm no expert in containers and CPU's so I'll likely need some help fixing this.

Based on discussion in https://bugs.python.org/issue36054 and the implementation of CPU counting in JVM in https://github.com/openjdk/jdk/blob/d5686b87f31d6c57ec6b3e5e9c85a04209dbac7a/src/hotspot/os/linux/os_linux.cpp#L5304-L5336
https://github.com/openjdk/jdk/blob/2d5137e403e16b694800b2ffe18c3640396b757e/src/hotspot/os/linux/osContainer_linux.cpp#L517-L591

Am I correct in thinking that we incorrectly determine whether cgroups is activated? The third step in the JVM process is what we mimic with _cpu_count but I guess in your case we shouldn't even end up there?

Pierre-Sassoulas · 2022-08-23T12:00:22Z

Related to #6098 and #6903

dsilakov · 2022-08-23T12:00:38Z

To be more precise - in JVM, the magic you are referencing to happens inside the OSContainer class and JVM seems to have some trickier logic when detecting if it is running "containerized" (under cgroup control) or not.

Having /sys/fs/cgroup/cpu/cpu.shares and other cpu.shares for particular processes on the host system is completely legit - for example, systemd uses them to limit CPU resources. However I don't see a generic way to transform these "shares" to real cores since this depends on how many other processes do you have. Maybe division by 1024 indeed works inside AWS, but this still looks like some kind of heuristic.

A good explanation is provided for example in https://www.batey.info/cgroup-cpu-shares-for-docker.html:

The cpu_share has no meaning in isolation. For example, setting a cpu_share of 512 gives you no information about how much CPU time the container will get. Setting one container’s cpu_share to 512 and another container’s to 1024 means that the second container will get double the amount of CPU time as the first. That said, it still gives no information about the actual amount of CPU time each container will get, only the relative amount between them.

dsilakov · 2022-08-23T12:05:40Z

Heh, after a closer look, JVM also contains a set of hacks for treating cpu.shares equal to 1024:

https://github.com/openjdk/jdk/blob/master/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp#L278

https://github.com/openjdk/jdk/blob/master/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1Subsystem.java#L247

Honestly I don't know what should be a proper fix for pylint here. For our scenarios, I've just commented out "elif os.path.isfile('/sys/fs/cgroup/cpu/cpu.shares')" part. In our K8s cluster and dockers we do see /sys/fs/cgroup/cpu/cpu.cfs_quota_us so the code still works fine inside the containers. But I can't check AWS for which the problematic elif was intended for.

DanielNoord · 2022-08-23T12:12:20Z

Yeah, I'm not an user of AWS myself but I'm wondering if there is some other way we can use to detect if we are in an AWS environment instead of what we currently use in the elif.

Pierre-Sassoulas · 2024-04-04T20:59:38Z

We had a duplicate opened in #9537 and the op said pytest does things correctly. So here's how pytest-xdist does it:

https://github.com/pytest-dev/pytest-xdist/blob/470bc83a253d6a05d2f282e981c61e438dbd0302/src/xdist/plugin.py#L14-L52

dsilakov · 2024-04-05T20:03:23Z

We had a duplicate opened in #9537 and the op said pytest does things correctly. So here's how pytest-xdist does it:

https://github.com/pytest-dev/pytest-xdist/blob/470bc83a253d6a05d2f282e981c61e438dbd0302/src/xdist/plugin.py#L14-L52

That solution is still not perfect, if you run it inside docker container than it won't take container limitations into account.

E.g., inside a container launched via docker --cpus 2 ... all functions used in that patch (psutil.cpu_count(), len(sched_getaffinity(0)), multiprocess.cpu_count() will all return the actual number of CPU cores on the host, not 2. So this brings us to the original issue - it was an attempt to fix issue with container limitations that break -j0.

dsilakov added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Aug 22, 2022

Pierre-Sassoulas added Bug 🪲 multiprocessing and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Aug 22, 2022

dsilakov mentioned this issue Aug 26, 2022

analyze-build doesn’t honor cgroup CPU limitations if launched inside a container llvm/llvm-project#57050

Open

jacobtylerwalls mentioned this issue Apr 1, 2023

Jobs setting ignored #8523

Closed

Pierre-Sassoulas mentioned this issue Apr 4, 2024

Pylint doesn't run in parallel if -j=0 is specified #9537

Closed

Pierre-Sassoulas added the Needs PR This issue is accepted, sufficiently specified and now needs an implementation label Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pylint -j0 always uses 1 core on real hardware in Linux #7338

pylint -j0 always uses 1 core on real hardware in Linux #7338

dsilakov commented Aug 22, 2022

DanielNoord commented Aug 23, 2022

Pierre-Sassoulas commented Aug 23, 2022

dsilakov commented Aug 23, 2022

dsilakov commented Aug 23, 2022 •

edited

Loading

DanielNoord commented Aug 23, 2022

Pierre-Sassoulas commented Apr 4, 2024

dsilakov commented Apr 5, 2024

pylint -j0 always uses 1 core on real hardware in Linux #7338

pylint -j0 always uses 1 core on real hardware in Linux #7338

Comments

dsilakov commented Aug 22, 2022

Bug description

Configuration

Command used

Pylint output

Expected behavior

Pylint version

OS / Environment

Additional dependencies

DanielNoord commented Aug 23, 2022

Pierre-Sassoulas commented Aug 23, 2022

dsilakov commented Aug 23, 2022

dsilakov commented Aug 23, 2022 • edited Loading

DanielNoord commented Aug 23, 2022

Pierre-Sassoulas commented Apr 4, 2024

dsilakov commented Apr 5, 2024

dsilakov commented Aug 23, 2022 •

edited

Loading