Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(simd.h): AVX-512 round function (AcademySoftwareFoundation#4119)
This PR fixes vfloat16 round function. Intrinsic `_mm512_roundscale_ps` was used incorrectly, and caused failure on Zen4 CPU. ``` /var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0/src/libutil/simd_test.cpp:1579: FAILED: round(F) == mkvec<VEC>(std::round(F[0]), std::round(F[1]), std::round(F[2]), std::round(F[3])) values were '-1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4' and '-2 0 2 4 -2 0 2 4 -2 0 2 4 -2 0 2 4' ``` In old code `_mm512_roundscale_ps (a, (1<<4) | 3)` meant the following: ``` [0001] - Number of fixed points to preserve [0] - Use MSCSR exception mask [0] - Select mode from imm [11] - Truncate mode ``` Effectively enabling rounding to nearest 0.5, not to integer. References: * https://www.felixcloutier.com/x86/vrndscalepd#fig-5-29 * https://stackoverflow.com/questions/50854991/instrinsic-mm512-round-ps-is-missing-for-avx512 Signed-off-by: Sv. Lockal <[email protected]> Signed-off-by: Peter Kovář <[email protected]>
- Loading branch information