-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sqrt/sqrtf if stable intrinsics allow #216
Conversation
I don't know what the process of picking a reviewer is, other than that I'm pretty sure I shouldn't be reviewing my own PRs of course. GitHub has a suggested reviewer, but also gives their status as "busy". I'm not sure who else to ping for this, but I guess everyone subscribed to the repo gets an email so whoever will see this.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The master branch barely tests anything (I don't think we even test the result of sqrt(-1.)
), so I don't know if doing large changes to the implementation without exhaustive testing makes sense.
Also, I would prefer for cfg_if to be used here and to split the different implementations into different functions, with this PR we end up with 3 implementations (wasm, x86-sse2, and others) and if we keep adding more it is going to be hard to track what's going on.
let m = _mm_set_sd(x); | ||
let m_sqrt = _mm_sqrt_pd(m); | ||
_mm_cvtsd_f64(m_sqrt) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't using the same code here as for wasm32 produce the exact same effect for targets with SSE2? If not, why not ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't understand the question. sse2
is an x86 and x86_64 feature. It will never be available during a wasm32 build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that instead of using the core::arch
intrinsics here, you can just write unsafe { ::core::intrinsics::sqrtf64(x) }
just like wasm32 does, and that should generate this exact same code if SSE2 is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, see: https://rust.godbolt.org/z/eEF9ef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, but that's not stable. this code is. that's why i did not also add a neon version, I'm only out to help out stable rust today.
you're right that we should probably also reconsider calling the core intrinsic in more cases if Nightly is in use.
we can add cfg-if and use that, but it's not a current dependency and i didn't want to add one carelessly. we can certainly add some more tests. i think that refactoring the whole wasm32 path vs every other path thing in general, which happens in almost every single function, is a little out of scope for this PR. We should definitely have a bigger discussion about that separately. I'd hope that's not a blocker to this particular PR. |
FWIW I wasn't suggesting that, only suggesting that for the functions for which more than the 2 current paths are exposed, adding more
@alexcrichton should decide on that, we can always c&p the macro here somewhere if that's better than adding a dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The master branch barely tests anything
This is not true, each intrinsic has hundreds of randomized tests generated to compare against musl upstream. It probably doesn't test literally -1
but that's what handwritten unit tests are for.
It's possible to use cfg_if!
but I think it'd be best to just avoid the need altogether. I think think an ideal macro for this use case would be something like:
fn the_intrinsic() {
the_macro! {
if #[cfg(sse)] {
// .. SSE implementation
}
}
// fallback implementation
}
I would prefer to not have to deal with unused imports and scoping fallback code wherever possible, generating code that looks something like:
fn the_intrinsic() {
#[cfg(sse)] {
if true {
// .. SSE implementation
}
}
// .. fallback implementation
}
It's just too much of a pain to work with to always have to deal with #[cfg]
everywhere, including imports. It's also a bummer to increase indentation so much. I'm hoping a macro can largely solve this, and I'm pretty sure that macro is not literally cfg_if
Whether this is true or not depends on what those 500 random tests are doing. When I checked, some of the intrinsics had less than 30% code coverage no matter how often I regenerated those tests. So sure, the library does have some tests, but I don't think these tests are, by far, sufficient to test a new implementation of any math function. Hence why I said "barely" tested. |
AFAIK This libm impl. is the only one out there that do random tests, all other use the same test vector but test for precision and spec. I would argue that random test is not really something you want to rely on, they are great as early indicator. |
SO, here's the deal: @gnzlbg told me last night on zulip that apparently I've been using libm wrong this whole time.
BASED ON THIS NEW INFO: I'm going to just close up this particular PR for now, no changes needed. When I get time I'll petition T-libs to just make these functions stable in core since "they're already there" anyway. |
@Lokathor You can use The best way to use these is to use Using the symbols is a possibility, but you need to make sure that the symbols exist, and you probably won't get any LLVM optimizations, like the one you have implemented in this PR. The only "advantage" of using the symbols over libm is that if your target has a C math library, those functions will be called instead, but that's pretty much it. If you want to get LLVM optimizations like libstd does, but in a If you don't care about using the math functions from a C library or about LLVM optimizations, or want to use the exact same math functions for multiple targets independently of what the C library does, or are constrained to stable Rust, or many other use cases, using libm directly might be the best choice. Relying on the symbols existing is super-brittle, and will probably break if we ever allow any kind of user customization for these symbol names like GCC and Clang allow. Currently, |
For those use cases, this PR does improve things, so it might be worth it to keep it open or merge it. It is however unclear whether the use cases that would be satisfied by this aren't better satisfied by exposing the math functions from libcore. |
Ah, hmm, if it's still an improvement then I'll open it back up. |
Alright,
Ready for re-review and possible merge if it passes review. |
Yes this won't actually ever be used on x86/x86_64 targets that have sse with libstd because |
well, me for one. More broadly, anyone else who depends on libm as a Stable-rust, core-only source of math operations will need this update. Particularly, some gamedev-wg folks and I have been trying to improve the state of linear algebra libraries for graphics, and in several benchmarks nalgebra had 10x or worse time taken compared to other common libraries because the I agree with gnzlbg that the correct long term path is to work on getting more core::intrinsics stabilized, but the PR for that RFC hasn't seen any activity in 2019 at all (reminder link), and there's a rustc tracking issue for it that's had some success but also seems to not be moving along very quickly. Given this, I think it's appropriate to make this improvement here. |
Ok that sounds reasonable to me! Before we fill this out for other intrinsics I'd prefer to get the macro story and style sorted out, but moving forward with at least this one sounds reasonable to me. |
Per the question in #214, this uses the intel intrinsics if compile time settings allow.