Improve sqrt/sqrtf if stable intrinsics allow #216

Lokathor · 2019-08-07T20:08:05Z

Per the question in #214, this uses the intel intrinsics if compile time settings allow.

Lokathor · 2019-08-07T20:28:36Z

I don't know what the process of picking a reviewer is, other than that I'm pretty sure I shouldn't be reviewing my own PRs of course. GitHub has a suggested reviewer, but also gives their status as "busy". I'm not sure who else to ping for this, but I guess everyone subscribed to the repo gets an email so whoever will see this.

The PR is ready to be reviewed and merged.
This is a "patch" level change, the public API is unchanged.

gnzlbg

The master branch barely tests anything (I don't think we even test the result of sqrt(-1.)), so I don't know if doing large changes to the implementation without exhaustive testing makes sense.

Also, I would prefer for cfg_if to be used here and to split the different implementations into different functions, with this PR we end up with 3 implementations (wasm, x86-sse2, and others) and if we keep adding more it is going to be hard to track what's going on.

gnzlbg · 2019-08-08T08:31:55Z

src/math/sqrt.rs

+            let m = _mm_set_sd(x);
+            let m_sqrt = _mm_sqrt_pd(m);
+            _mm_cvtsd_f64(m_sqrt)
+        }


Doesn't using the same code here as for wasm32 produce the exact same effect for targets with SSE2? If not, why not ?

I think I don't understand the question. sse2 is an x86 and x86_64 feature. It will never be available during a wasm32 build.

I meant that instead of using the core::arch intrinsics here, you can just write unsafe { ::core::intrinsics::sqrtf64(x) } just like wasm32 does, and that should generate this exact same code if SSE2 is enabled.

For example, see: https://rust.godbolt.org/z/eEF9ef

ah, but that's not stable. this code is. that's why i did not also add a neon version, I'm only out to help out stable rust today.

you're right that we should probably also reconsider calling the core intrinsic in more cases if Nightly is in use.

Lokathor · 2019-08-08T09:16:14Z

we can add cfg-if and use that, but it's not a current dependency and i didn't want to add one carelessly.

we can certainly add some more tests.

i think that refactoring the whole wasm32 path vs every other path thing in general, which happens in almost every single function, is a little out of scope for this PR. We should definitely have a bigger discussion about that separately. I'd hope that's not a blocker to this particular PR.

gnzlbg · 2019-08-08T09:18:32Z

i think that refactoring the whole wasm32 path vs every other path thing in general, which happens in almost every single function, is a little out of scope for this PR.

FWIW I wasn't suggesting that, only suggesting that for the functions for which more than the 2 current paths are exposed, adding more #[cfg()] + #[cfg(not(...))] paths starts getting out-of-hand.

we can add cfg-if and use that, but it's not a current dependency and i didn't want to add one carelessly.

@alexcrichton should decide on that, we can always c&p the macro here somewhere if that's better than adding a dependency.

alexcrichton

The master branch barely tests anything

This is not true, each intrinsic has hundreds of randomized tests generated to compare against musl upstream. It probably doesn't test literally -1 but that's what handwritten unit tests are for.

It's possible to use cfg_if! but I think it'd be best to just avoid the need altogether. I think think an ideal macro for this use case would be something like:

fn the_intrinsic() {
    the_macro! {
        if #[cfg(sse)] {
            // .. SSE implementation
        }
    }

    // fallback implementation
}

I would prefer to not have to deal with unused imports and scoping fallback code wherever possible, generating code that looks something like:

fn the_intrinsic() {
    #[cfg(sse)] {
        if true {
            // .. SSE implementation
        }
    }
    // .. fallback implementation
}

It's just too much of a pain to work with to always have to deal with #[cfg] everywhere, including imports. It's also a bummer to increase indentation so much. I'm hoping a macro can largely solve this, and I'm pretty sure that macro is not literally cfg_if

src/math/sqrtf.rs

gnzlbg · 2019-08-08T16:14:18Z

This is not true, each intrinsic has hundreds of randomized tests generated to compare against musl upstream

Whether this is true or not depends on what those 500 random tests are doing. When I checked, some of the intrinsics had less than 30% code coverage no matter how often I regenerated those tests. So sure, the library does have some tests, but I don't think these tests are, by far, sufficient to test a new implementation of any math function. Hence why I said "barely" tested.

Schultzer · 2019-08-08T16:25:50Z

AFAIK This libm impl. is the only one out there that do random tests, all other use the same test vector but test for precision and spec. I would argue that random test is not really something you want to rely on, they are great as early indicator.

Lokathor · 2019-08-08T16:31:28Z

SO, here's the deal: @gnzlbg told me last night on zulip that apparently I've been using libm wrong this whole time.

This PR is based around the idea that you'd have to declare a libm dependency, call the function, and then rustc builds the crate and all that.
It turns out that because libm is special you can just declare an extern "C" block with all the functions you want and the functions will be there (in a "non-guaranteed", "not our fault if it breaks" sort of way) because either the platform's C libm will have them or compiler-builtins will use Rust libm to provide them.
If you do depend on and then build libm yourself you get worse code in many situations because llvm knows about libm and will often enough optimize your code for you (eg: sqrt(0.0) gets const folded to just 0.0).

BASED ON THIS NEW INFO: I'm going to just close up this particular PR for now, no changes needed. When I get time I'll petition T-libs to just make these functions stable in core since "they're already there" anyway.

gnzlbg · 2019-08-08T17:06:27Z

@Lokathor You can use libm by just adding it as a dependency. There are different ways of using libm that might be better for your use case, depending on what you are doing.

The best way to use these is to use libstd, but since you are kind of building your own, you probably can't do that.

Using the symbols is a possibility, but you need to make sure that the symbols exist, and you probably won't get any LLVM optimizations, like the one you have implemented in this PR. The only "advantage" of using the symbols over libm is that if your target has a C math library, those functions will be called instead, but that's pretty much it.

If you want to get LLVM optimizations like libstd does, but in a #![no_std] library, you either need to use core::intrinsics or link_llvm_intrinsics to call the right LLVM intrinsics, and make sure that the symbols are available. If after optimizations the symbol is called, and that resolves to a libm function, you kind of want those to be optimized properly. In this particular case, for these particular SSE2 targets, LLVM never calls the symbol, but this isn't true in general, so it makes sense to optimize the functions here (it is just unclear whether it is worth it to provide stable Rust code that does that instead of a 1-line of LLVM code when we know that LLVM does the right thing, e.g., like for wasm32).

If you don't care about using the math functions from a C library or about LLVM optimizations, or want to use the exact same math functions for multiple targets independently of what the C library does, or are constrained to stable Rust, or many other use cases, using libm directly might be the best choice.

Relying on the symbols existing is super-brittle, and will probably break if we ever allow any kind of user customization for these symbol names like GCC and Clang allow. Currently, libcore has some tests that rely on some of the symbols existing. (e.g. see here), so if you are considering relying on that, you might want to, first run libcore tests for the targets you care about.

gnzlbg · 2019-08-08T17:09:48Z

If you don't care about using the math functions from a C library or about LLVM optimizations, or want to use the exact same math functions for multiple targets independently of what the C library does, or are constrained to stable Rust, or many other use cases, using libm directly might be the best choice.

For those use cases, this PR does improve things, so it might be worth it to keep it open or merge it. It is however unclear whether the use cases that would be satisfied by this aren't better satisfied by exposing the math functions from libcore.

Lokathor · 2019-08-08T17:16:18Z

Ah, hmm, if it's still an improvement then I'll open it back up.

Lokathor · 2019-08-09T00:32:43Z

Alright,

fixed the comments that were mentioned in review.
Opened a new issue for discussion of code path selection improvement Unify different code paths via macro #217

Ready for re-review and possible merge if it passes review.

alexcrichton · 2019-08-09T16:18:50Z

Yes this won't actually ever be used on x86/x86_64 targets that have sse with libstd because 1.0.sqrt() will cause LLVM to do the equivalent of what this is already doing, but @Lokathor you're thinking that you'd still like this to land for any users either explicitly using symbols and/or this library itself? Do you know of users doing that on x86/x86_64?

Lokathor · 2019-08-09T16:40:07Z

well, me for one.

More broadly, anyone else who depends on libm as a Stable-rust, core-only source of math operations will need this update.

Particularly, some gamedev-wg folks and I have been trying to improve the state of linear algebra libraries for graphics, and in several benchmarks nalgebra had 10x or worse time taken compared to other common libraries because the std feature flag had been left off the build which caused nalgebra to switch to a pure-software sqrt even on x86/x86_64 where they could have stayed with a stable hardware sqrt given enough conditional compilation. We can do that tricky work on behalf of others and then tell them "just use libm, it does the right thing".

I agree with gnzlbg that the correct long term path is to work on getting more core::intrinsics stabilized, but the PR for that RFC hasn't seen any activity in 2019 at all (reminder link), and there's a rustc tracking issue for it that's had some success but also seems to not be moving along very quickly. Given this, I think it's appropriate to make this improvement here.

alexcrichton · 2019-08-12T14:57:51Z

Ok that sounds reasonable to me!

Before we fill this out for other intrinsics I'd prefer to get the macro story and style sorted out, but moving forward with at least this one sounds reasonable to me.

Improve sqrt/sqrtf if stable intrinsics allow

b351893

Lokathor self-assigned this Aug 7, 2019

Lokathor added 2 commits August 7, 2019 14:10

apply rustfmt

519c5d6

move use/const statements to a limited scope

a9fa14d

gnzlbg reviewed Aug 8, 2019

View reviewed changes

alexcrichton reviewed Aug 8, 2019

View reviewed changes

src/math/sqrtf.rs Outdated Show resolved Hide resolved

Lokathor closed this Aug 8, 2019

Lokathor reopened this Aug 8, 2019

Lokathor added 2 commits August 8, 2019 18:21

update comments

a3d55e6

update comments

b0f666e

Lokathor mentioned this pull request Aug 9, 2019

Unify different code paths via macro #217

Closed

alexcrichton merged commit 32a6a99 into rust-lang:master Aug 12, 2019

Lokathor removed their assignment Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve sqrt/sqrtf if stable intrinsics allow #216

Improve sqrt/sqrtf if stable intrinsics allow #216

Lokathor commented Aug 7, 2019

Lokathor commented Aug 7, 2019

gnzlbg left a comment •

edited

Loading

gnzlbg Aug 8, 2019

Lokathor Aug 8, 2019

gnzlbg Aug 8, 2019

gnzlbg Aug 8, 2019

Lokathor Aug 8, 2019

Lokathor commented Aug 8, 2019

gnzlbg commented Aug 8, 2019

alexcrichton left a comment

gnzlbg commented Aug 8, 2019 •

edited

Loading

Schultzer commented Aug 8, 2019

Lokathor commented Aug 8, 2019

gnzlbg commented Aug 8, 2019 •

edited

Loading

gnzlbg commented Aug 8, 2019

Lokathor commented Aug 8, 2019

Lokathor commented Aug 9, 2019

alexcrichton commented Aug 9, 2019

Lokathor commented Aug 9, 2019

alexcrichton commented Aug 12, 2019

Improve sqrt/sqrtf if stable intrinsics allow #216

Improve sqrt/sqrtf if stable intrinsics allow #216

Conversation

Lokathor commented Aug 7, 2019

Lokathor commented Aug 7, 2019

gnzlbg left a comment • edited Loading

Choose a reason for hiding this comment

gnzlbg Aug 8, 2019

Choose a reason for hiding this comment

Lokathor Aug 8, 2019

Choose a reason for hiding this comment

gnzlbg Aug 8, 2019

Choose a reason for hiding this comment

gnzlbg Aug 8, 2019

Choose a reason for hiding this comment

Lokathor Aug 8, 2019

Choose a reason for hiding this comment

Lokathor commented Aug 8, 2019

gnzlbg commented Aug 8, 2019

alexcrichton left a comment

Choose a reason for hiding this comment

gnzlbg commented Aug 8, 2019 • edited Loading

Schultzer commented Aug 8, 2019

Lokathor commented Aug 8, 2019

gnzlbg commented Aug 8, 2019 • edited Loading

gnzlbg commented Aug 8, 2019

Lokathor commented Aug 8, 2019

Lokathor commented Aug 9, 2019

alexcrichton commented Aug 9, 2019

Lokathor commented Aug 9, 2019

alexcrichton commented Aug 12, 2019

gnzlbg left a comment •

edited

Loading

gnzlbg commented Aug 8, 2019 •

edited

Loading

gnzlbg commented Aug 8, 2019 •

edited

Loading