-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openblas runs single thread with OMP_PROC_BIND=TRUE or GOMP_CPU_AFFINITY #2238
Comments
First command line does not compile OpenBLAS. |
Yes, First command shows how i am "running" a script via mpirun. Edit1: Edit2: |
Caches are shared between multiple cores, there is no technical merit in your effort, OS schedulers do better than 1:1 ever since advent of hyperthreading. |
Hi, Seems "OS scheduling policy" works as expected in case of mkl & blis. Will keep a tab on upcoming releases of openmpi - if they solve this issue. |
You still did not share OpenBLAS build options. for pthread version you need to set threads to be used in BLAS calls, for OMP version it collapses to single-threaded in OMP parallel section. |
Here are the command lines which i used for compiling openblas-0.3.7 (i have tried linking versions with HPL) -
please let me know if there are issues with the compilation method. |
No issues except that you probably do not want NO_AFFINITY defined in this context. This looks to be a duplicate of #2052 as you already noted, just curious why you are supplying your own CFLAGS ? |
Actually it is quite dangerous to trick with FFLAGS, as overriden it makes LAPACK part not thread-safe |
There is a similar issue, OpenBLAS running single threaded with OpenMP affinity setting OMP_PLACES=threads. By looking the code in develop branch, the reason seems to be following: During initialization, Later on, OpenBLAS in principle tries to use number of threads specified in OMP_NUM_THREADS by calling
However, due to Issue could be resolved by not using affinity mask during gotoblas_init, or by obeying OMP_NUM_THREADS in |
@jussienko Thanks, this is certainly worth investigating. From #2380 (comment) it is also possible that the code hits a design flaw in libgomp. |
When running OpenBLAS under debugger, one sees (in my system which supports hyperthreading) that with I made just a pull request (#2706) which contains one possible fix. |
Hm. If the actual problem is "only" with OMP_PLACES=threads I'd rather test for that case (i.e. add another getenv) ? (I already prepared that last night, just did not have a chance to test it) |
I guess OMP_PLACES=threads is the most common, but in principle any setting that limits master thread to single hyperthread reproduces the issue. For example in my four core workstation (0-3 are the first hyperthreads within the physical cores) all the following settings will make OpenBLAS to run only with single thread:
|
You are right my half-baked attempt is not going to work then. |
Also, in systems without hyperthreading, just OMP_PLACES=cores produces the issue (I just made a short test for that) |
Thanks for doing these tests - as far as I can tell, the relevant code in common_thread.h is essentially unchanged from the libGoto2 of ten years ago, so saw OpenMP 3.1 (which introduced OMP_PROC_BIND IIRC) at best. |
Actually, the master branch works ok (I tested that first before realizing that it is pretty old and develop is the correct one), but the reason is that in there |
Err, well, I certainly fumbled with that after 0.2.20. So perhaps sched_getaffinity should not be used in an OpenMP context, at least not when OMP_PLACES or any other "modern" form of OpenMP-based affinity handling is in effect. |
The bug OpenMathLib/OpenBLAS#2238 is fixed in the development version
* Better options for aocc, clarify aocc is in clang module * Due to bug in OpenBLAS remove thread binding from defaults The bug OpenMathLib/OpenBLAS#2238 is fixed in the development version * REmove fixme * Fix table formatting * Improve AMD naming * Typo fix * Add example with thread binding (with Openblas caveat) * Specify full module name * typo
Hi,
I had compiled hpl-2.3 with openblas 0.3.7 ( gcc 9.2 + openmpi 4.1.0 ) on centos 7. Here is how i was running HPL -
where appfile.sh has -
I noticed that hpl was running on single thread.
To narrow this issue down , i wrote a simpler code -
with aforementioned program, i get following with mpirun :
but when i eliminate GOMP_CPU_AFFINITY and OMP_PROC_BIND from my scripts -
Is there a bug in latest release of openblas?
Edit:
tested with 0.3.6, got same issue. Hope this issue is fixed in 0.3.8
Note that this issue shows up - with NO_AFFINTY=1 and without.
The text was updated successfully, but these errors were encountered: