-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Implement SIMD versions of isnan,isinf, isfinite and signbit #22165
ENH: Implement SIMD versions of isnan,isinf, isfinite and signbit #22165
Conversation
Taking a look at the smoke_test failure! |
🤦♂️ |
s390x and ppc64le builds on travis are failing, linux32 is crashing |
Yeah, have an updated patch coming shortly |
32-bit linux tests are segfaulting. |
Yup! Thought we got it all, working on resolving. Once these are all sorted we will go back through the remaining open PRs and update them as well. |
Just to note, that the 32bit debug run is still failing (presumably some assert?). |
Yeah, it's not reproducing locally, so we are trying to figure out why it's failing in CI. Open to suggestions =) |
Not rocket science, but since I don't have a better idea: I pushed a commit to try and see if we can get the gdb backtrace in the CI run (before we need to track down what is special about the run). Lets see if it works... |
f78280a
to
a05d960
Compare
Well, had to reproduce it locally to be smart enough to use Here is what I get locally, please do not hesitate to simply force-push the CI addition away, I suspect it will give the same traceback now so will keep it in case it is useful to you.
It seems to me the test is failing in the memory-check only, which is weird. And I am actually not sure which assert is failing here at all! |
OK, this seems more useful (adding a print before probably helped report it clearer):
printing out:
gives:
Not sure if it is helpful, but maybe... |
Alright CI, this is it. You can do it. Green Check Time. |
Woo! @seberg thank you for the clues, was exactly what was missing =) Much appreciated! |
Could you confirm that the benchmarks run before the fixes are still valid after them? |
Most certainly @mattip, will update with latest. |
Still looks good on our end!
|
Indeed, this is a really nice speedup for a commonly used operation |
fb48797
to
41638be
Compare
Any thoughts about the missing coverage for output strides? @seiko2plus thoughts? |
Coverage should be caught up now. |
486e226
to
ef1c63b
Compare
@seiko2plus could you look over this? |
NumPy has SIMD versions of float / double `isnan`, `isinf`, `isfinite`, and `signbit` for SSE2 and AVX-512. The changes here replace the SSE2 version with one that uses their universal intrinsics. This allows other architectures to have SIMD versions of the functions too.
Use reinterpret to support casting across many compiler generations Resolve deprecation warnings
Special case SSE Fix PPC64 build Only use vqtbl4q_u8 on A64 Stop trying to use optimizations on s390x
We don't see these failures but CI is hitting them, attempting to resolve
On linux 32 an assert fires where stride (12) passed from ufunc_object (try_trivial_single_output_loop) to DOUBLE_isnan and DOUBLE_isfinite doesn't match the type size (8), we can relax this assert and instead fall back to the UNARY_LOOP path instead
…lets see...)" This reverts commit a05d960.
…litting this new code into a new file
… it was reading random data and not reliable also didn't give additional coverage
bee8071
to
d7b19a4
Compare
663d61c
to
f277da4
Compare
@mattip @seiko2plus Changes should all be integrated thanks! |
Any further issues to investigate on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, thank you.
NumPy has SIMD versions of float / double
isnan
,isinf
,isfinite
, andsignbit
for SSE2 and AVX-512. The changes here replace the SSE2 version with one that uses universal intrinsics. This allows other architectures to have SIMD versions of the functions too.Apple M1: up to 3.4x faster
Apple M1 Rosetta: up to 2.1x faster
iMacPro (AVX512): Similar. A handful of benchmarks are 15% faster and another are 15% faster. Which ones show up where changes depending on the run. Averaging all gains / losses we're at ~4% faster.