Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HQ worker keeps running without jobs #784

Open
svatosFZU opened this issue Nov 13, 2024 · 3 comments
Open

HQ worker keeps running without jobs #784

svatosFZU opened this issue Nov 13, 2024 · 3 comments

Comments

@svatosFZU
Copy link

Hi,
I have observed situation when there are HQ workers and batch jobs running but all HQ jobs are waiting. I assume the HQ closes workers when they do not run anything. So, I assume something blocked that. I have logged into the WN and listed processes. I would welcome any insight into what could keep the worker running.

Listing of running processes:

[[email protected] ~]$ ssh cn51
Last login: Tue Nov 12 17:21:12 2024 from 10.32.2.1
[[email protected] ~]$ ps uax
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0 249676  7024 ?        Ss   Sep13   2:36 /usr/lib/systemd/systemd --switched-root --system --deserialize 16
root           2  0.0  0.0      0     0 ?        S    Sep13   0:02 [kthreadd]
root           3  0.0  0.0      0     0 ?        I<   Sep13   0:00 [rcu_gp]
root           4  0.0  0.0      0     0 ?        I<   Sep13   0:00 [rcu_par_gp]
root           6  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/0:0H-events_highpri]
root          10  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mm_percpu_wq]
root          11  0.0  0.0      0     0 ?        S    Sep13   0:23 [ksoftirqd/0]
root          12  0.0  0.0      0     0 ?        I    Sep13   9:34 [rcu_sched]
root          13  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/0]
root          14  0.0  0.0      0     0 ?        S    Sep13   0:00 [watchdog/0]
root          15  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/0]
root          16  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/1]
root          17  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/1]
root          18  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/1]
root          19  0.0  0.0      0     0 ?        S    Sep13   0:18 [ksoftirqd/1]
root          21  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/1:0H-events_highpri]
root          22  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/2]
root          23  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/2]
root          24  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/2]
root          25  0.0  0.0      0     0 ?        S    Sep13   0:16 [ksoftirqd/2]
root          27  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/2:0H-events_highpri]
root          28  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/3]
root          29  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/3]
root          30  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/3]
root          31  0.0  0.0      0     0 ?        S    Sep13   0:15 [ksoftirqd/3]
root          33  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/3:0H-events_highpri]
root          34  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/4]
root          35  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/4]
root          36  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/4]
root          37  0.0  0.0      0     0 ?        S    Sep13   0:15 [ksoftirqd/4]
root          39  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/4:0H-events_highpri]
root          40  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/5]
root          41  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/5]
root          42  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/5]
root          43  0.0  0.0      0     0 ?        S    Sep13   0:15 [ksoftirqd/5]
root          45  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/5:0H-events_highpri]
root          46  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/6]
root          47  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/6]
root          48  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/6]
root          49  0.0  0.0      0     0 ?        S    Sep13   0:15 [ksoftirqd/6]
root          51  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/6:0H-events_highpri]
root          52  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/7]
root          53  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/7]
root          54  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/7]
root          55  0.0  0.0      0     0 ?        S    Sep13   0:15 [ksoftirqd/7]
root          57  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/7:0H-events_highpri]
root          58  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/8]
root          59  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/8]
root          60  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/8]
root          61  0.0  0.0      0     0 ?        S    Sep13   0:15 [ksoftirqd/8]
root          63  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/8:0H-events_highpri]
root          64  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/9]
root          65  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/9]
root          66  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/9]
root          67  0.0  0.0      0     0 ?        S    Sep13   0:14 [ksoftirqd/9]
root          69  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/9:0H-events_highpri]
root          70  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/10]
root          71  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/10]
root          72  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/10]
root          73  0.0  0.0      0     0 ?        S    Sep13   0:15 [ksoftirqd/10]
root          75  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/10:0H-events_highpri]
root          76  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/11]
root          77  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/11]
root          78  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/11]
root          79  0.0  0.0      0     0 ?        S    Sep13   0:14 [ksoftirqd/11]
root          81  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/11:0H-events_highpri]
root          82  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/12]
root          83  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/12]
root          84  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/12]
root          85  0.0  0.0      0     0 ?        S    Sep13   0:15 [ksoftirqd/12]
root          87  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/12:0H-events_highpri]
root          88  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/13]
root          89  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/13]
root          90  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/13]
root          91  0.0  0.0      0     0 ?        S    Sep13   0:14 [ksoftirqd/13]
root          93  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/13:0H-events_highpri]
root          94  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/14]
root          95  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/14]
root          96  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/14]
root          97  0.0  0.0      0     0 ?        S    Sep13   0:13 [ksoftirqd/14]
root          99  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/14:0H-events_highpri]
root         100  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/15]
root         101  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/15]
root         102  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/15]
root         103  0.0  0.0      0     0 ?        S    Sep13   0:14 [ksoftirqd/15]
root         105  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/15:0H-events_highpri]
root         106  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/16]
root         107  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/16]
root         108  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/16]
root         109  0.0  0.0      0     0 ?        S    Sep13   0:14 [ksoftirqd/16]
root         111  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/16:0H-events_highpri]
root         112  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/17]
root         113  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/17]
root         114  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/17]
root         115  0.0  0.0      0     0 ?        S    Sep13   0:15 [ksoftirqd/17]
root         117  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/17:0H-events_highpri]
root         118  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/18]
root         119  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/18]
root         120  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/18]
root         121  0.0  0.0      0     0 ?        S    Sep13   0:14 [ksoftirqd/18]
root         123  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/18:0H-events_highpri]
root         125  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/19]
root         126  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/19]
root         127  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/19]
root         128  0.0  0.0      0     0 ?        S    Sep13   0:12 [ksoftirqd/19]
root         130  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/19:0H-events_highpri]
root         131  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/20]
root         132  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/20]
root         133  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/20]
root         134  0.0  0.0      0     0 ?        S    Sep13   0:13 [ksoftirqd/20]
root         136  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/20:0H-events_highpri]
root         137  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/21]
root         138  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/21]
root         139  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/21]
root         140  0.0  0.0      0     0 ?        S    Sep13   0:12 [ksoftirqd/21]
root         142  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/21:0H-events_highpri]
root         143  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/22]
root         144  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/22]
root         145  0.0  0.0      0     0 ?        S    Sep13   0:01 [migration/22]
root         146  0.0  0.0      0     0 ?        S    Sep13   0:12 [ksoftirqd/22]
root         148  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/22:0H-events_highpri]
root         149  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/23]
root         150  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/23]
root         151  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/23]
root         152  0.0  0.0      0     0 ?        S    Sep13   0:12 [ksoftirqd/23]
root         154  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/23:0H-events_highpri]
root         155  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/24]
root         156  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/24]
root         157  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/24]
root         158  0.0  0.0      0     0 ?        S    Sep13   0:12 [ksoftirqd/24]
root         160  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/24:0H-events_highpri]
root         161  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/25]
root         162  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/25]
root         163  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/25]
root         164  0.0  0.0      0     0 ?        S    Sep13   0:11 [ksoftirqd/25]
root         166  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/25:0H-events_highpri]
root         167  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/26]
root         168  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/26]
root         169  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/26]
root         170  0.0  0.0      0     0 ?        S    Sep13   0:11 [ksoftirqd/26]
root         172  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/26:0H-events_highpri]
root         173  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/27]
root         174  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/27]
root         175  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/27]
root         176  0.0  0.0      0     0 ?        S    Sep13   0:11 [ksoftirqd/27]
root         178  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/27:0H-events_highpri]
root         179  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/28]
root         180  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/28]
root         181  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/28]
root         182  0.0  0.0      0     0 ?        S    Sep13   0:10 [ksoftirqd/28]
root         184  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/28:0H-events_highpri]
root         185  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/29]
root         186  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/29]
root         187  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/29]
root         188  0.0  0.0      0     0 ?        S    Sep13   0:11 [ksoftirqd/29]
root         190  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/29:0H-events_highpri]
root         191  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/30]
root         192  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/30]
root         193  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/30]
root         194  0.0  0.0      0     0 ?        S    Sep13   0:11 [ksoftirqd/30]
root         196  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/30:0H-events_highpri]
root         197  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/31]
root         198  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/31]
root         199  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/31]
root         200  0.0  0.0      0     0 ?        S    Sep13   0:12 [ksoftirqd/31]
root         202  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/31:0H-events_highpri]
root         203  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/32]
root         204  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/32]
root         205  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/32]
root         206  0.0  0.0      0     0 ?        S    Sep13   0:11 [ksoftirqd/32]
root         208  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/32:0H-events_highpri]
root         209  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/33]
root         210  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/33]
root         211  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/33]
root         212  0.0  0.0      0     0 ?        S    Sep13   0:11 [ksoftirqd/33]
root         214  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/33:0H-events_highpri]
root         215  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/34]
root         216  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/34]
root         217  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/34]
root         218  0.0  0.0      0     0 ?        S    Sep13   0:13 [ksoftirqd/34]
root         220  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/34:0H-events_highpri]
root         221  0.0  0.0      0     0 ?        S    Sep13   0:00 [cpuhp/35]
root         222  0.0  0.0      0     0 ?        S    Sep13   0:01 [watchdog/35]
root         223  0.0  0.0      0     0 ?        S    Sep13   0:00 [migration/35]
root         224  0.0  0.0      0     0 ?        S    Sep13   0:13 [ksoftirqd/35]
root         226  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/35:0H-events_highpri]
root         263  0.0  0.0      0     0 ?        S    Sep13   0:00 [kdevtmpfs]
root         264  0.0  0.0      0     0 ?        I<   Sep13   0:00 [netns]
root         265  0.0  0.0      0     0 ?        S    Sep13   0:03 [kauditd]
root         270  0.0  0.0      0     0 ?        S    Sep13   0:07 [khungtaskd]
root         271  0.0  0.0      0     0 ?        S    Sep13   0:04 [oom_reaper]
root         272  0.0  0.0      0     0 ?        I<   Sep13   0:00 [writeback]
root         273  0.0  0.0      0     0 ?        S    Sep13   0:00 [kcompactd0]
root         274  0.0  0.0      0     0 ?        S    Sep13   0:00 [kcompactd1]
root         275  0.0  0.0      0     0 ?        SN   Sep13   0:00 [ksmd]
root         276  0.0  0.0      0     0 ?        I<   Sep13   0:00 [crypto]
root         277  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kintegrityd]
root         278  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kblockd]
root         279  0.0  0.0      0     0 ?        I<   Sep13   0:00 [blkcg_punt_bio]
root         298  0.0  0.0      0     0 ?        I<   Sep13   0:00 [tpm_dev_wq]
root         299  0.0  0.0      0     0 ?        I<   Sep13   0:00 [md]
root         300  0.0  0.0      0     0 ?        I<   Sep13   0:00 [edac-poller]
root         302  0.0  0.0      0     0 ?        S    Sep13   0:00 [watchdogd]
root         304  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/17:1H-kblockd]
root         340  0.0  0.0      0     0 ?        S    Sep13   0:03 [kswapd0]
root         341  0.0  0.0      0     0 ?        S    Sep13   0:00 [kswapd1]
root         434  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kthrotld]
root         436  0.0  0.0      0     0 ?        I<   Sep13   0:00 [acpi_thermal_pm]
root         437  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kmpath_rdacd]
root         438  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kaluad]
root         439  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kstrp]
root         518  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/3:1H-kblockd]
root         525  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/5:1H-kblockd]
root         527  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/6:1H-kblockd]
root         528  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/7:1H-xfs-log/dm-0]
root         583  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/4:1H-kblockd]
root         584  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/19:1H-kblockd]
root         587  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/8:1H-kblockd]
root         588  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/27:1H-kblockd]
root         591  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/28:1H-kblockd]
root         593  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/10:1H-kblockd]
root         597  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/9:1H-kblockd]
root         598  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/29:1H-kblockd]
root         600  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/30:1H-kblockd]
root         602  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/11:1H-kblockd]
root         605  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/12:1H-kblockd]
root         606  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/13:1H-xfs-log/dm-0]
root         613  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/15:1H-kblockd]
root         619  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/16:1H-kblockd]
root         626  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/2:1H-kblockd]
root         679  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/33:1H-kblockd]
root         683  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/32:1H-kblockd]
root         684  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/34:1H-kblockd]
root         686  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/35:1H-kblockd]
root         687  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/18:1H-kblockd]
root         699  0.0  0.0      0     0 ?        I<   Sep13   0:00 [rpciod]
root         700  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/u75:0]
root         703  0.0  0.0      0     0 ?        I<   Sep13   0:00 [xprtiod]
root         711  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/24:1H-kblockd]
root         717  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/22:1H-kblockd]
root         719  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/25:1H-kblockd]
root         722  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/23:1H-kblockd]
root         761  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/31:1H-kblockd]
root         762  0.0  0.0      0     0 ?        I<   Sep13   0:02 [kworker/1:1H-kblockd]
root         764  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/14:1H-kblockd]
root         855  0.0  0.0      0     0 ?        I<   Sep13   0:00 [i40e]
root         885  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/26:1H-xfs-log/dm-0]
root         886  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/0:1H-kblockd]
root         887  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/21:1H-kblockd]
root         888  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kworker/20:1H-kblockd]
root         997  0.0  0.0      0     0 ?        S<   Sep13   0:24 [loop0]
root        1002  0.0  0.0      0     0 ?        S<   Sep13   6:19 [loop1]
root        1009  0.0  0.0      0     0 ?        S<   Sep13   0:22 [loop2]
root        1013  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kdmflush]
root        1016  0.0  0.0      0     0 ?        I<   Sep13   0:00 [dm_bufio_cache]
root        1018  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ksnaphd]
root        1019  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kcopyd]
root        1026  0.0  0.0      0     0 ?        I<   Sep13   0:00 [kdmflush]
root        1059  0.0  0.0      0     0 ?        I<   Sep13   0:00 [xfsalloc]
root        1060  0.0  0.0      0     0 ?        I<   Sep13   0:00 [xfs_mru_cache]
root        1061  0.0  0.0      0     0 ?        I<   Sep13   0:00 [xfs-buf/dm-0]
root        1062  0.0  0.0      0     0 ?        I<   Sep13   0:00 [xfs-conv/dm-0]
root        1063  0.0  0.0      0     0 ?        I<   Sep13   0:00 [xfs-cil/dm-0]
root        1064  0.0  0.0      0     0 ?        I<   Sep13   0:00 [xfs-reclaim/dm-]
root        1065  0.0  0.0      0     0 ?        I<   Sep13   0:00 [xfs-eofblocks/d]
root        1066  0.0  0.0      0     0 ?        I<   Sep13   0:00 [xfs-log/dm-0]
root        1067  0.0  0.0      0     0 ?        S    Sep13   4:29 [xfsaild/dm-0]
root        1193  0.0  0.0 268356 141888 ?       Ss   Sep13   0:57 /usr/lib/systemd/systemd-journald
root        1275  0.0  0.0 107256  4156 ?        Ss   Sep13   0:05 /usr/lib/systemd/systemd-udevd
rpc         1277  0.0  0.0  67176  1420 ?        Ss   Sep13   0:02 /usr/bin/rpcbind -w -f
root        1283  0.0  0.0  76996   692 ?        S<sl Sep13   0:19 /sbin/auditd
root        1408  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ata_sff]
root        1409  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_0]
root        1410  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_0]
root        1412  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_1]
root        1413  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_1]
root        1414  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_2]
root        1415  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_2]
root        1416  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_3]
root        1417  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_3]
root        1431  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_4]
root        1432  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_4]
root        1433  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_5]
root        1434  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_5]
root        1435  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_6]
root        1436  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_6]
root        1437  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_7]
root        1438  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_7]
root        1439  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_8]
root        1440  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_8]
root        1441  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_9]
root        1442  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_9]
root        1443  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_10]
root        1444  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_10]
root        1445  0.0  0.0      0     0 ?        S    Sep13   0:00 [scsi_eh_11]
root        1446  0.0  0.0      0     0 ?        I<   Sep13   0:00 [scsi_tmf_11]
polkitd     1452  0.0  0.0 2013428 8412 ?        Ssl  Sep13   0:00 /usr/lib/polkit-1/polkitd --no-debug
dbus        1458  0.0  0.0  73816  2520 ?        Ss   Sep13   0:02 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root        1538  0.0  0.0 125512  2136 ?        Ssl  Sep13  25:38 /usr/sbin/irqbalance --foreground
root        1557  0.0  0.0      0     0 ?        I<   Sep13   0:00 [nfit]
root        1594  0.0  0.0 103980  2004 ?        Ss   Sep13   0:05 /usr/lib/systemd/systemd-logind
root        1665  0.0  0.0 377948  6532 ?        Ssl  Sep13   1:30 /usr/sbin/NetworkManager --no-daemon
root        2034  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ib-comp-wq]
root        2035  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ib-comp-unb-wq]
root        2036  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ib_mcast]
root        2037  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ib_nl_sa_wq]
root        2045  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_health0000]
root        2046  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_page_alloc]
root        2048  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_cmd_0000:5]
root        2049  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_events]
root        2050  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_fw_reset_e]
root        2052  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_fw_tracer]
root        2053  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_hv_vhca]
root        2054  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_fc]
root        2056  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_health0000]
root        2057  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_page_alloc]
root        2058  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_cmd_0000:8]
root        2059  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_events]
root        2060  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_fw_reset_e]
root        2062  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_fw_tracer]
root        2063  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_hv_vhca]
root        2064  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_fc]
root        2068  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5_ib_sigerr_]
root        2069  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ib_mad1]
root        2070  0.0  0.0      0     0 ?        I<   Sep13   0:00 [to_fifo]
root        2071  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mkey_cache]
root        2073  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ib_mad1]
root        2074  0.0  0.0      0     0 ?        I<   Sep13   0:00 [to_fifo]
root        2075  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mkey_cache]
root        2125  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5e]
root        2127  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ipoib_wq]
root        2130  0.0  0.0      0     0 ?        I<   Sep13   0:00 [mlx5e]
root        2230  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ipoib_wq]
root        2284  0.0  0.0      0     0 ?        I<   Sep13   0:00 [rdma_cm]
root        2399  0.0  0.0  76532  2424 ?        Ss   Sep13   0:42 /usr/sbin/sshd -D [email protected],[email protected],aes256-ctr,aes256-cbc,[email protected],aes128-ctr,aes128-cb
root        2401  0.0  0.0 615592 14536 ?        Ssl  Sep13   5:22 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
root        2411  0.0  0.0  38860  1300 ?        Ss   Sep13   0:00 /usr/bin/rhsmcertd
root        2414  0.0  0.0  98860   932 ?        Ssl  Sep13   0:00 /usr/sbin/gssproxy -D
icm         2415  0.0  0.0  98832  1816 ?        Ssl  Sep13   4:21 /usr/sbin/ibms_mad_agent --daemon
root        2425  0.0  0.0  26092   596 ?        Ss   Sep13   0:00 /usr/sbin/atd -f
root        2427  0.0  0.0  22896  1752 ?        Ss   Sep13   0:03 /usr/sbin/crond -n
root        2434  0.0  0.0 219604   116 tty1     Ss+  Sep13   0:00 /sbin/agetty -o -p -- \u --noclear tty1 linux
root        2435  0.0  0.0   6552   428 ttyS0    Ss+  Sep13   0:00 /sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS0 vt220
munge       2447  0.0  0.0 233384  1032 ?        Sl   Sep13   0:26 /usr/sbin/munged
root        2779  0.0  0.0 105196  1832 ?        Ss   Sep13   0:11 /usr/libexec/postfix/master -w
postfix     2781  0.0  0.0 120696  2364 ?        S    Sep13   0:02 qmgr -l -t unix -u
rpcuser     2876  0.0  0.0  69660 25420 ?        Ss   Sep13   0:00 /usr/sbin/rpc.statd
root        2909  0.0  0.0 523552 17368 ?        Ssl  Sep13   0:29 /usr/sbin/automount --systemd-service --dont-check-daemon
root        2959  0.0  0.0      0     0 ?        I<   Sep13   0:00 [nfsiod]
root        3185  0.0  0.0      0     0 ?        S    Sep13   0:00 [NFSv4 callback]
root       10699  0.0  0.0 206588  3628 ?        Ss   Sep13   0:04 /usr/sbin/sssd -i --logger=files
root       10700  0.0  0.0 212884  6884 ?        S    Sep13   1:08 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
root       10701  0.0  0.0 250160 10540 ?        S    Sep13   0:46 /usr/libexec/sssd/sssd_be --domain barbora.it4i.cz --uid 0 --gid 0 --logger=files
root       10702  0.0  0.0 224948 31504 ?        S    Sep13   0:42 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
root       10703  0.0  0.0 196788  4644 ?        S    Sep13   0:18 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files
root       10704  0.0  0.0 190308  2396 ?        S    Sep13   0:20 /usr/libexec/sssd/sssd_ssh --uid 0 --gid 0 --logger=files
root       11160  0.0  0.0      0     0 ?        S    Sep13   0:00 [cfs_rh_00]
root       11161  0.0  0.0      0     0 ?        S    Sep13   0:00 [cfs_rh_01]
root       11162  0.0  0.0      0     0 ?        S    Sep13   0:00 [cfs_rh_02]
root       11163  0.0  0.0      0     0 ?        S    Sep13   0:00 [cfs_rh_03]
root       11200  0.0  0.0      0     0 ?        S    Sep13   0:58 [kiblnd_connd]
root       11201  0.0  0.0      0     0 ?        S    Sep13  56:52 [kiblnd_sd_00_00]
root       11202  0.0  0.0      0     0 ?        S    Sep13  56:45 [kiblnd_sd_00_01]
root       11203  0.0  0.0      0     0 ?        S    Sep13  56:58 [kiblnd_sd_00_02]
root       11204  0.0  0.0      0     0 ?        S    Sep13  56:45 [kiblnd_sd_00_03]
root       11205  0.0  0.0      0     0 ?        S    Sep13   0:00 [kiblnd_sd_01_00]
root       11206  0.0  0.0      0     0 ?        S    Sep13   0:00 [kiblnd_sd_01_01]
root       11207  0.0  0.0      0     0 ?        S    Sep13   0:00 [kiblnd_sd_01_02]
root       11208  0.0  0.0      0     0 ?        S    Sep13   0:00 [kiblnd_sd_01_03]
root       11209  0.0  0.0      0     0 ?        S    Sep13   0:27 [monitor_thread]
root       11210  0.0  0.0      0     0 ?        S    Sep13   0:22 [lnet_discovery]
root       11233  0.0  0.0      0     0 ?        I<   Sep13   0:00 [obd_zombid]
root       11236  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_000]
root       11237  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_001]
root       11238  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_002]
root       11239  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_003]
root       11240  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_004]
root       11241  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_005]
root       11242  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_006]
root       11243  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_007]
root       11244  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_008]
root       11245  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_009]
root       11246  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_010]
root       11247  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_011]
root       11248  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_012]
root       11249  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_013]
root       11250  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_014]
root       11251  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_015]
root       11252  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_016]
root       11254  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr00_017]
root       11255  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_000]
root       11256  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_001]
root       11257  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_002]
root       11258  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_003]
root       11259  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_004]
root       11260  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_005]
root       11261  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_006]
root       11262  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_007]
root       11263  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_008]
root       11264  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_009]
root       11265  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_010]
root       11266  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_011]
root       11267  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_012]
root       11268  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_013]
root       11269  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_014]
root       11270  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_015]
root       11271  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_016]
root       11272  0.0  0.0      0     0 ?        S    Sep13   0:00 [ptlrpc_hr01_017]
root       11273  0.0  0.0      0     0 ?        S    Sep13   0:01 [ptlrpcd_rcv]
root       11274  0.0  0.0      0     0 ?        S    Sep13   7:22 [ptlrpcd_00_00]
root       11275  0.0  0.0      0     0 ?        S    Sep13   4:59 [ptlrpcd_00_01]
root       11276  0.0  0.0      0     0 ?        S    Sep13   7:22 [ptlrpcd_00_02]
root       11277  0.0  0.0      0     0 ?        S    Sep13   4:59 [ptlrpcd_00_03]
root       11278  0.0  0.0      0     0 ?        S    Sep13   7:22 [ptlrpcd_00_04]
root       11279  0.0  0.0      0     0 ?        S    Sep13   4:58 [ptlrpcd_00_05]
root       11280  0.0  0.0      0     0 ?        S    Sep13   7:21 [ptlrpcd_00_06]
root       11281  0.0  0.0      0     0 ?        S    Sep13   4:59 [ptlrpcd_00_07]
root       11282  0.0  0.0      0     0 ?        S    Sep13   7:22 [ptlrpcd_00_08]
root       11283  0.0  0.0      0     0 ?        S    Sep13   4:58 [ptlrpcd_00_09]
root       11284  0.0  0.0      0     0 ?        S    Sep13   7:23 [ptlrpcd_00_10]
root       11285  0.0  0.0      0     0 ?        S    Sep13   4:58 [ptlrpcd_00_11]
root       11286  0.0  0.0      0     0 ?        S    Sep13   7:23 [ptlrpcd_00_12]
root       11287  0.0  0.0      0     0 ?        S    Sep13   4:59 [ptlrpcd_00_13]
root       11288  0.0  0.0      0     0 ?        S    Sep13   7:21 [ptlrpcd_00_14]
root       11289  0.0  0.0      0     0 ?        S    Sep13   4:59 [ptlrpcd_00_15]
root       11290  0.0  0.0      0     0 ?        S    Sep13   7:22 [ptlrpcd_00_16]
root       11291  0.0  0.0      0     0 ?        S    Sep13   4:59 [ptlrpcd_00_17]
root       11292  0.0  0.0      0     0 ?        S    Sep13   8:04 [ptlrpcd_01_00]
root       11293  0.0  0.0      0     0 ?        S    Sep13   5:25 [ptlrpcd_01_01]
root       11294  0.0  0.0      0     0 ?        S    Sep13   8:05 [ptlrpcd_01_02]
root       11295  0.0  0.0      0     0 ?        S    Sep13   5:23 [ptlrpcd_01_03]
root       11296  0.0  0.0      0     0 ?        S    Sep13   8:04 [ptlrpcd_01_04]
root       11297  0.0  0.0      0     0 ?        S    Sep13   5:25 [ptlrpcd_01_05]
root       11298  0.0  0.0      0     0 ?        S    Sep13   8:03 [ptlrpcd_01_06]
root       11299  0.0  0.0      0     0 ?        S    Sep13   5:26 [ptlrpcd_01_07]
root       11300  0.0  0.0      0     0 ?        S    Sep13   8:04 [ptlrpcd_01_08]
root       11301  0.0  0.0      0     0 ?        S    Sep13   5:25 [ptlrpcd_01_09]
root       11302  0.0  0.0      0     0 ?        S    Sep13   8:06 [ptlrpcd_01_10]
root       11303  0.0  0.0      0     0 ?        S    Sep13   5:23 [ptlrpcd_01_11]
root       11304  0.0  0.0      0     0 ?        S    Sep13   8:04 [ptlrpcd_01_12]
root       11305  0.0  0.0      0     0 ?        S    Sep13   5:26 [ptlrpcd_01_13]
root       11306  0.0  0.0      0     0 ?        S    Sep13   8:04 [ptlrpcd_01_14]
root       11307  0.0  0.0      0     0 ?        S    Sep13   5:26 [ptlrpcd_01_15]
root       11308  0.0  0.0      0     0 ?        S    Sep13   8:04 [ptlrpcd_01_16]
root       11309  0.0  0.0      0     0 ?        S    Sep13   5:26 [ptlrpcd_01_17]
root       11310  0.0  0.0      0     0 ?        I<   Sep13   0:00 [ptlrpc_pinger]
root       11321  0.0  0.0      0     0 ?        S    Sep13   0:01 [ldlm_cb00_000]
root       11322  0.0  0.0      0     0 ?        S    Sep13   0:00 [lc_watchdogd]
root       11323  0.0  0.0      0     0 ?        S    Sep13   0:01 [ldlm_cb00_001]
root       11324  0.0  0.0      0     0 ?        S    Sep13   0:00 [ldlm_cb01_000]
root       11325  0.0  0.0      0     0 ?        S    Sep13   0:00 [ldlm_cb01_001]
root       11326  0.0  0.0      0     0 ?        S    Sep13   1:36 [ldlm_bl_01]
root       11327  0.0  0.0      0     0 ?        S    Sep13   1:41 [ldlm_bl_02]
root       11328  0.0  0.0      0     0 ?        S    Sep13   0:00 [ll_cfg_requeue]
root       12498  0.0  0.0  17868   552 ?        Ss   Sep13   0:00 /usr/sbin/rasdaemon -f -r
nrpe       13196  0.0  0.0  20736  1156 ?        Ss   Sep13   3:10 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f
telegraf   13224  0.1  0.0 5657132 63024 ?       SLsl Sep13 112:37 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
root       13257  0.0  0.0 330772  1700 ?        S    Sep13   0:00 sudo /usr/local/bin/telegraf_cpu_msr.py
telegraf   13258  0.0  0.0 259320  8580 ?        S    Sep13   3:16 /usr/bin/python3 /usr/local/bin/telegraf_ipmi.py
root       13259  0.0  0.0 330772  1736 ?        S    Sep13   0:00 sudo /usr/local/bin/lustre_telegraf.py
root       13261  0.0  0.0 100880  4608 ?        Ss   Sep13   0:01 /usr/lib/systemd/systemd --user
root       13262  0.0  0.0 293688  3076 ?        S    Sep13   0:00 (sd-pam)
root       13273  0.0  0.0 262304  9352 ?        S    Sep13   3:36 /usr/bin/python3 /usr/local/bin/lustre_telegraf.py
root       13274  0.0  0.0 252288  5328 ?        S    Sep13   2:08 python3 /usr/local/bin/telegraf_cpu_msr.py
root      148367  0.0  0.0      0     0 ?        S    Sep14   1:27 [ldlm_bl_03]
root      148368  0.0  0.0      0     0 ?        S    Sep14   1:37 [ldlm_bl_04]
root      148369  0.0  0.0      0     0 ?        S    Sep14   0:01 [ldlm_cb00_002]
root      148370  0.0  0.0      0     0 ?        S    Sep14   0:01 [ldlm_cb00_003]
root      148371  0.0  0.0      0     0 ?        S    Sep14   1:29 [ldlm_bl_05]
root      148416  0.0  0.0      0     0 ?        S    Sep14   1:29 [ldlm_bl_06]
root      214961  0.0  0.0      0     0 ?        S    Sep14   1:54 [ldlm_bl_07]
root      242187  0.0  0.0      0     0 ?        S    Sep15   1:28 [ldlm_bl_08]
root      242188  0.0  0.0      0     0 ?        S    Sep15   1:37 [ldlm_bl_09]
root      242189  0.0  0.0      0     0 ?        S    Sep15   1:38 [ldlm_bl_10]
root      242190  0.0  0.0      0     0 ?        S    Sep15   1:34 [ldlm_bl_11]
root      242191  0.0  0.0      0     0 ?        S    Sep15   1:36 [ldlm_bl_12]
root      242192  0.0  0.0      0     0 ?        S    Sep15   1:40 [ldlm_bl_13]
root      242193  0.0  0.0      0     0 ?        S    Sep15   1:35 [ldlm_bl_14]
root      242194  0.0  0.0      0     0 ?        S    Sep15   1:39 [ldlm_bl_15]
root      242195  0.0  0.0      0     0 ?        S    Sep15   1:37 [ldlm_bl_16]
root      242196  0.0  0.0      0     0 ?        S    Sep15   1:28 [ldlm_bl_17]
root      242197  0.0  0.0      0     0 ?        S    Sep15   1:37 [ldlm_bl_18]
root      242198  0.0  0.0      0     0 ?        S    Sep15   1:29 [ldlm_bl_19]
root      242199  0.0  0.0      0     0 ?        S    Sep15   1:39 [ldlm_bl_20]
root      242200  0.0  0.0      0     0 ?        S    Sep15   1:32 [ldlm_bl_21]
root      242201  0.0  0.0      0     0 ?        S    Sep15   1:26 [ldlm_bl_22]
root      242202  0.0  0.0      0     0 ?        S    Sep15   1:37 [ldlm_bl_23]
root      242203  0.0  0.0      0     0 ?        S    Sep15   1:25 [ldlm_bl_24]
root      242204  0.0  0.0      0     0 ?        S    Sep15   1:35 [ldlm_bl_25]
root      242205  0.0  0.0      0     0 ?        S    Sep15   1:29 [ldlm_bl_26]
root      242206  0.0  0.0      0     0 ?        S    Sep15   1:27 [ldlm_bl_27]
root      242207  0.0  0.0      0     0 ?        S    Sep15   1:34 [ldlm_bl_28]
root      561117  0.0  0.0      0     0 ?        S    Sep17   0:01 [ldlm_cb00_004]
root      588035  0.0  0.0      0     0 ?        S    Sep17   0:01 [ldlm_cb00_005]
root      612608  0.0  0.0      0     0 ?        S    Sep17   0:00 [ldlm_cb01_002]
root      631053  0.0  0.0      0     0 ?        S    Sep17   0:01 [ldlm_cb00_006]
root      633667  0.0  0.0      0     0 ?        S    Sep17   0:00 [ldlm_cb01_003]
root      710605  0.0  0.0      0     0 ?        S    Sep17   0:00 [ldlm_cb01_004]
root     1630344  0.0  0.0      0     0 ?        S    Sep19   0:01 [ldlm_cb00_007]
root     1928608  0.0  0.0      0     0 ?        S    Sep19   0:01 [ldlm_cb00_008]
root     1957214  0.0  0.0      0     0 ?        S    Nov01   0:04 [ldlm_bl_37]
root     1957215  0.0  0.0      0     0 ?        S    Nov01   0:09 [ldlm_bl_38]
root     1957216  0.0  0.0      0     0 ?        S    Nov01   0:05 [ldlm_bl_39]
root     1957217  0.0  0.0      0     0 ?        S    Nov01   0:04 [ldlm_bl_40]
root     1957218  0.0  0.0      0     0 ?        S    Nov01   0:03 [ldlm_bl_41]
root     1957219  0.0  0.0      0     0 ?        S    Nov01   0:04 [ldlm_bl_42]
root     1957220  0.0  0.0      0     0 ?        S    Nov01   0:04 [ldlm_bl_43]
root     1957221  0.0  0.0      0     0 ?        S    Nov01   0:03 [ldlm_bl_44]
root     1957222  0.0  0.0      0     0 ?        S    Nov01   0:03 [ldlm_bl_45]
root     1957223  0.0  0.0      0     0 ?        S    Nov01   0:08 [ldlm_bl_46]
root     1957227  0.0  0.0      0     0 ?        S    Nov01   0:07 [ldlm_bl_47]
root     1957228  0.0  0.0      0     0 ?        S    Nov01   0:06 [ldlm_bl_48]
root     1957229  0.0  0.0      0     0 ?        S    Nov01   0:03 [ldlm_bl_49]
root     1957230  0.0  0.0      0     0 ?        S    Nov01   0:03 [ldlm_bl_50]
root     1957231  0.0  0.0      0     0 ?        S    Nov01   0:02 [ldlm_bl_51]
root     1957232  0.0  0.0      0     0 ?        S    Nov01   0:08 [ldlm_bl_52]
root     1957233  0.0  0.0      0     0 ?        S    Nov01   0:04 [ldlm_bl_53]
root     1957234  0.0  0.0      0     0 ?        S    Nov01   0:04 [ldlm_bl_54]
root     1957235  0.0  0.0      0     0 ?        S    Nov01   0:08 [ldlm_bl_55]
root     1957236  0.0  0.0      0     0 ?        S    Nov01   0:05 [ldlm_bl_56]
root     1983855  0.0  0.0      0     0 ?        S    Nov01   0:04 [ldlm_bl_57]
root     1983856  0.0  0.0      0     0 ?        S    Nov01   0:02 [ldlm_bl_58]
root     1983857  0.0  0.0      0     0 ?        S    Nov01   0:04 [ldlm_bl_59]
root     1983858  0.0  0.0      0     0 ?        S    Nov01   0:04 [ldlm_bl_60]
root     2291750  0.0  0.0 369476 13924 ?        Ssl  Sep20   2:10 /usr/sbin/rsyslogd -n
chrony   2292630  0.0  0.0  19044   168 ?        S    Sep20   0:14 /usr/sbin/chronyd
root     2467071  0.0  0.0 1230284 9080 ?        S    Nov04   0:03 /opt/slurm/sbin/slurmd --systemd
root     2586470  0.0  0.0      0     0 ?        I<   Nov05   0:00 [kworker/u77:1-xprtiod]
root     2793376  0.0  0.0      0     0 ?        I    Nov06   0:00 [kworker/30:0-cgroup_destroy]
root     2910606  0.0  0.0      0     0 ?        I    Nov07   0:00 [kworker/23:1-cgroup_pidlist_destroy]
root     2926987  0.0  0.0      0     0 ?        I    Nov07   0:00 [kworker/34:3-events]
root     2927166  0.0  0.0      0     0 ?        I    Nov07   0:09 [kworker/32:1-rcu_par_gp]
root     2963685  0.0  0.0      0     0 ?        S    Sep21   1:24 [ldlm_bl_29]
root     2963686  0.0  0.0      0     0 ?        S    Sep21   1:26 [ldlm_bl_30]
root     2963687  0.0  0.0      0     0 ?        S    Sep21   1:31 [ldlm_bl_31]
root     3266385  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/29:1-cgroup_destroy]
root     3267561  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/21:0-cgroup_destroy]
root     3272313  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/30:1-mm_percpu_wq]
root     3272894  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/23:2-cgroup_pidlist_destroy]
root     3277239  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/21:2-mm_percpu_wq]
root     3282003  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/20:1-mm_percpu_wq]
root     3284194  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/20:2-events]
root     3297527  0.0  0.0      0     0 ?        S    Sep21   0:00 [ldlm_cb01_005]
root     3297914  0.0  0.0      0     0 ?        S    Sep21   1:34 [ldlm_bl_32]
root     3297915  0.0  0.0      0     0 ?        S    Sep21   1:40 [ldlm_bl_33]
root     3297916  0.0  0.0      0     0 ?        S    Sep21   1:23 [ldlm_bl_34]
root     3297917  0.0  0.0      0     0 ?        S    Sep21   1:40 [ldlm_bl_35]
root     3297918  0.0  0.0      0     0 ?        S    Sep21   1:32 [ldlm_bl_36]
root     3304629  0.1  0.0      0     0 ?        I    Nov09   5:03 [kworker/0:1-events]
root     3307693  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/18:1-cgroup_destroy]
root     3310033  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/24:1-cgroup_destroy]
root     3312493  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/22:2-cgroup_destroy]
root     3313719  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/27:1-rcu_par_gp]
root     3313832  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/28:2-mm_percpu_wq]
root     3317100  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/33:1-mm_percpu_wq]
root     3322835  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/22:1-events]
root     3329564  0.0  0.0      0     0 ?        I    Nov09   0:00 [kworker/35:2-mm_percpu_wq]
root     3347314  0.0  0.0      0     0 ?        I    Nov10   0:00 [kworker/7:0-mm_percpu_wq]
root     3404273  0.0  0.0      0     0 ?        I    Nov10   0:00 [kworker/33:3-cgroup_destroy]
root     3479505  0.0  0.0      0     0 ?        I    Nov10   0:00 [kworker/26:0-rcu_par_gp]
root     3479909  0.0  0.0      0     0 ?        I<   Nov10   0:00 [kworker/u77:0-xprtiod]
root     3483182  0.0  0.0      0     0 ?        I    Nov10   0:00 [kworker/0:0]
root     3483183  0.0  0.0      0     0 ?        I    Nov10   0:00 [kworker/31:1-rcu_gp]
root     3483193  0.0  0.0      0     0 ?        I    Nov10   0:00 [kworker/19:1-events]
root     3488366  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/34:0-mm_percpu_wq]
root     3546419  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/15:1-mm_percpu_wq]
root     3594788  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/2:2-mm_percpu_wq]
root     3613014  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/14:0-events]
root     3616683  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/15:0-mm_percpu_wq]
root     3629346  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/3:1-events]
root     3629647  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/11:2-mm_percpu_wq]
root     3630785  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/u74:1-events_unbound]
root     3632031  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/13:0-rcu_gp]
root     3635493  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/u74:0-events_unbound]
root     3637275  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/25:2-cgroup_pidlist_destroy]
root     3637276  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/18:2-mm_percpu_wq]
root     3637447  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/32:2-mm_percpu_wq]
root     3637771  0.0  0.0 259724  5544 ?        Sl   Nov11   0:01 slurmstepd: [813442.extern]
root     3637775  0.0  0.0   4388   864 ?        S    Nov11   0:00 sleep 100000000
root     3637778  0.0  0.0 260016  5936 ?        Sl   Nov11   0:03 slurmstepd: [813442.batch]
svatosm  3637782  0.0  0.0 222516  3260 ?        S    Nov11   0:00 /bin/bash /var/spool/slurmd/job813442/slurm_script
root     3638239  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/14:1-ptlrpc_pinger]
svatosm  3638553  0.0  0.0 246632  9132 ?        Sl   Nov11   0:07 /home/svatosm/hq-v0.19.0-linux-x64/hq worker start --idle-timeout 5m --manager slurm --server-dir /home/svatosm/.hq-server/002 --on-server-lost fi
root     3689827  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/31:2-mm_percpu_wq]
root     3692659  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/3:2-mm_percpu_wq]
root     3706911  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/9:1-events]
root     3714115  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/26:2-rcu_gp]
root     3722793  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/27:2-mm_percpu_wq]
root     3722795  0.0  0.0      0     0 ?        I    Nov11   0:00 [kworker/2:3-cgroup_destroy]
root     3731957  0.0  0.0      0     0 ?        I    00:00   0:00 [kworker/35:0-events]
root     3731958  0.0  0.0      0     0 ?        I    00:00   0:00 [kworker/4:1-mm_percpu_wq]
root     3742806  0.0  0.0      0     0 ?        I    01:43   0:00 [kworker/5:2-events]
root     3745725  0.0  0.0      0     0 ?        I    02:11   0:00 [kworker/8:0-cgroup_destroy]
root     3747884  0.0  0.0      0     0 ?        I    02:31   0:00 [kworker/29:0-mm_percpu_wq]
root     3747885  0.0  0.0      0     0 ?        I    02:31   0:00 [kworker/9:0-events_power_efficient]
root     3749434  0.0  0.0      0     0 ?        I    02:46   0:00 [kworker/6:0-mm_percpu_wq]
root     3753192  0.0  0.0      0     0 ?        I    03:22   0:00 [kworker/10:0-mm_percpu_wq]
root     3760918  0.0  0.0      0     0 ?        I    04:35   0:00 [kworker/19:0]
root     3772496  0.0  0.0      0     0 ?        I    06:25   0:00 [kworker/6:3-events]
root     3786505  0.0  0.0      0     0 ?        I    08:39   0:00 [kworker/7:2-events]
root     3786978  0.0  0.0      0     0 ?        I    08:43   0:00 [kworker/24:2-mm_percpu_wq]
root     3792795  0.0  0.0      0     0 ?        I    09:39   0:00 [kworker/17:0-events]
root     3795976  0.0  0.0      0     0 ?        I    10:09   0:00 [kworker/12:0-events]
root     3797973  0.0  0.0      0     0 ?        I    10:28   0:00 [kworker/28:1]
root     3797975  0.0  0.0      0     0 ?        I    10:28   0:00 [kworker/8:3-cgroup_pidlist_destroy]
root     3799844  0.0  0.0      0     0 ?        I    10:46   0:00 [kworker/12:1-mm_percpu_wq]
root     3806735  0.0  0.0      0     0 ?        I    11:52   0:00 [kworker/11:3-cgroup_destroy]
root     3807223  0.0  0.0      0     0 ?        I    11:56   0:00 [kworker/u73:1-rpciod]
root     3811930  0.0  0.0      0     0 ?        I    12:41   0:00 [kworker/4:0-events]
root     3813434  0.0  0.0      0     0 ?        I    12:55   0:00 [kworker/13:3-rcu_par_gp]
root     3823941  0.0  0.0      0     0 ?        I    14:35   0:00 [kworker/16:0-cgroup_destroy]
root     3823942  0.0  0.0      0     0 ?        I    14:35   0:00 [kworker/17:2]
root     3825135  0.0  0.0      0     0 ?        I    14:46   0:00 [kworker/25:1-cgroup_pidlist_destroy]
root     3825137  0.0  0.0      0     0 ?        I    14:46   0:00 [kworker/16:2-cgroup_pidlist_destroy]
root     3827510  0.0  0.0      0     0 ?        I    15:08   0:00 [kworker/10:2-rcu_par_gp]
postfix  3831432  0.0  0.0 120644  8208 ?        S    15:46   0:00 pickup -l -t unix -u
root     3832774  0.0  0.0      0     0 ?        I    15:59   0:00 [kworker/5:3-mm_percpu_wq]
root     3835015  0.0  0.0      0     0 ?        I    16:20   0:00 [kworker/u73:2-events_unbound]
root     3837211  0.0  0.0      0     0 ?        I    16:41   0:00 [kworker/u72:1-ipoib_wq]
root     3837452  0.0  0.0      0     0 ?        I    16:43   0:00 [kworker/1:2-mm_percpu_wq]
root     3839575  0.0  0.0      0     0 ?        I    17:03   0:00 [kworker/1:0-mm_percpu_wq]
root     3839731  0.0  0.0      0     0 ?        I    17:05   0:00 [kworker/u72:0-ipoib_wq]
root     3840320  0.0  0.0      0     0 ?        I<   17:11   0:00 [kworker/u76:2-ib-comp-unb-wq]
root     3840933  0.0  0.0      0     0 ?        I<   17:16   0:00 [kworker/u76:0-ib-comp-unb-wq]
root     3841155  0.0  0.0      0     0 ?        I    17:19   0:00 [kworker/u72:2-ipoib_wq]
root     3841396  0.0  0.0      0     0 ?        I    17:21   0:00 [kworker/4:2]
root     3841401  0.0  0.0      0     0 ?        I    17:21   0:00 [kworker/16:1-mm_percpu_wq]
root     3841403  0.0  0.0      0     0 ?        I    17:21   0:00 [kworker/7:1-rcu_par_gp]
root     3841565  0.0  0.0      0     0 ?        I    17:21   0:00 [kworker/10:1]
root     3841580  0.0  0.0      0     0 ?        I    17:21   0:00 [kworker/13:1-mm_percpu_wq]
root     3841609  0.0  0.0      0     0 ?        I    17:21   0:00 [kworker/26:1-events]
root     3841624  0.0  0.0      0     0 ?        I    17:21   0:00 [kworker/32:0-events]
root     3841639  0.0  0.0      0     0 ?        I    17:21   0:00 [kworker/27:0-events]
root     3841654  0.0  0.0      0     0 ?        I    17:21   0:00 [kworker/25:0-cgroup_destroy]
root     3841664  0.0  0.0      0     0 ?        I    17:22   0:00 [kworker/3:0]
root     3841668  0.0  0.0      0     0 ?        I    17:22   0:00 [kworker/6:1]
root     3841672  0.0  0.0      0     0 ?        I    17:22   0:00 [kworker/u74:2-events_unbound]
root     3841673  0.0  0.0      0     0 ?        I    17:22   0:00 [kworker/8:1-mm_percpu_wq]
root     3841674  0.0  0.0      0     0 ?        I    17:22   0:00 [kworker/23:0-events]
root     3841676  0.0  0.0      0     0 ?        I    17:22   0:00 [kworker/9:2]
root     3841719  0.0  0.0      0     0 ?        I<   17:22   0:00 [kworker/u76:1-xprtiod]
root     3841780  0.0  0.0      0     0 ?        I    17:22   0:00 [kworker/1:1-mm_percpu_wq]
root     3841875  0.0  0.0 148460  9848 ?        SNs  17:23   0:00 sshd: svatosm [priv]
svatosm  3841879  0.2  0.0 100680 10032 ?        SNs  17:23   0:00 /usr/lib/systemd/systemd --user
svatosm  3841880  0.0  0.0 316264  3208 ?        SN   17:23   0:00 (sd-pam)
svatosm  3841887  0.0  0.0 148460  5404 ?        SN   17:23   0:00 sshd: svatosm@pts/0
root     3841888  0.0  0.0      0     0 ?        I    17:23   0:00 [kworker/25:3-events]
svatosm  3841889  0.2  0.0 237024  5044 pts/0    SNs  17:23   0:00 -bash
svatosm  3841930  0.0  0.0 268520  3916 pts/0    RN+  17:23   0:00 ps uax
@Kobzol
Copy link
Collaborator

Kobzol commented Dec 9, 2024

Hi, sorry for the late reply. If a worker does not receive any task to compute in 5 minutes, it should turn itself off.

However, there can be a situation where HQ repeatedly spawns allocations and workers even though it has waiting tasks. HQ cannot currently guess if a given allocation will be able to run a given task or not (because of resource requirements). It has a small heuristic not to spawn allocations that run for e.g. 10 minutes if tasks have a time request of 30 minutes, but it does not do much more.

So the following could happen:

  1. There is a task waiting to be computed
  2. Because of 1), HQ spawns an allocation that runs a worker
  3. The spawned worker is unable to execute the task, because it does not fulfill its resource requirements
  4. The worker times out and the allocation ends
  5. GOTO 2)

@svatosFZU
Copy link
Author

Thanks for the info. All my jobs have the same definition (a quarter of node's CPU + time-request of couple of hours under the batch queue limit). So, HQ should not have a problem with executing it but who knows what has happened. I would be curious, is there something in HQ debug that would give information that HQ is unable to execute a task and why?

@Kobzol
Copy link
Collaborator

Kobzol commented Dec 9, 2024

Currently, not (@spirali unless I'm mistaken). It's quite tricky to figure that out and log it by default, although we have been thinking about some explicit querying support, e.g. to ask "why is task X not executed by worker Y"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants