Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sqrt/sqrtf if stable intrinsics allow #216

Merged
merged 5 commits into from
Aug 12, 2019
Merged

Improve sqrt/sqrtf if stable intrinsics allow #216

merged 5 commits into from
Aug 12, 2019

Conversation

Lokathor
Copy link
Contributor

@Lokathor Lokathor commented Aug 7, 2019

Per the question in #214, this uses the intel intrinsics if compile time settings allow.

@Lokathor Lokathor self-assigned this Aug 7, 2019
@Lokathor
Copy link
Contributor Author

Lokathor commented Aug 7, 2019

I don't know what the process of picking a reviewer is, other than that I'm pretty sure I shouldn't be reviewing my own PRs of course. GitHub has a suggested reviewer, but also gives their status as "busy". I'm not sure who else to ping for this, but I guess everyone subscribed to the repo gets an email so whoever will see this.

  • The PR is ready to be reviewed and merged.
  • This is a "patch" level change, the public API is unchanged.

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The master branch barely tests anything (I don't think we even test the result of sqrt(-1.)), so I don't know if doing large changes to the implementation without exhaustive testing makes sense.

Also, I would prefer for cfg_if to be used here and to split the different implementations into different functions, with this PR we end up with 3 implementations (wasm, x86-sse2, and others) and if we keep adding more it is going to be hard to track what's going on.

let m = _mm_set_sd(x);
let m_sqrt = _mm_sqrt_pd(m);
_mm_cvtsd_f64(m_sqrt)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't using the same code here as for wasm32 produce the exact same effect for targets with SSE2? If not, why not ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't understand the question. sse2 is an x86 and x86_64 feature. It will never be available during a wasm32 build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that instead of using the core::arch intrinsics here, you can just write unsafe { ::core::intrinsics::sqrtf64(x) } just like wasm32 does, and that should generate this exact same code if SSE2 is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, but that's not stable. this code is. that's why i did not also add a neon version, I'm only out to help out stable rust today.

you're right that we should probably also reconsider calling the core intrinsic in more cases if Nightly is in use.

@Lokathor
Copy link
Contributor Author

Lokathor commented Aug 8, 2019

we can add cfg-if and use that, but it's not a current dependency and i didn't want to add one carelessly.

we can certainly add some more tests.

i think that refactoring the whole wasm32 path vs every other path thing in general, which happens in almost every single function, is a little out of scope for this PR. We should definitely have a bigger discussion about that separately. I'd hope that's not a blocker to this particular PR.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2019

i think that refactoring the whole wasm32 path vs every other path thing in general, which happens in almost every single function, is a little out of scope for this PR.

FWIW I wasn't suggesting that, only suggesting that for the functions for which more than the 2 current paths are exposed, adding more #[cfg()] + #[cfg(not(...))] paths starts getting out-of-hand.

we can add cfg-if and use that, but it's not a current dependency and i didn't want to add one carelessly.

@alexcrichton should decide on that, we can always c&p the macro here somewhere if that's better than adding a dependency.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The master branch barely tests anything

This is not true, each intrinsic has hundreds of randomized tests generated to compare against musl upstream. It probably doesn't test literally -1 but that's what handwritten unit tests are for.

It's possible to use cfg_if! but I think it'd be best to just avoid the need altogether. I think think an ideal macro for this use case would be something like:

fn the_intrinsic() {
    the_macro! {
        if #[cfg(sse)] {
            // .. SSE implementation
        }
    }

    // fallback implementation
}

I would prefer to not have to deal with unused imports and scoping fallback code wherever possible, generating code that looks something like:

fn the_intrinsic() {
    #[cfg(sse)] {
        if true {
            // .. SSE implementation
        }
    }
    // .. fallback implementation
}

It's just too much of a pain to work with to always have to deal with #[cfg] everywhere, including imports. It's also a bummer to increase indentation so much. I'm hoping a macro can largely solve this, and I'm pretty sure that macro is not literally cfg_if

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2019

This is not true, each intrinsic has hundreds of randomized tests generated to compare against musl upstream

Whether this is true or not depends on what those 500 random tests are doing. When I checked, some of the intrinsics had less than 30% code coverage no matter how often I regenerated those tests. So sure, the library does have some tests, but I don't think these tests are, by far, sufficient to test a new implementation of any math function. Hence why I said "barely" tested.

@Schultzer
Copy link
Contributor

AFAIK This libm impl. is the only one out there that do random tests, all other use the same test vector but test for precision and spec. I would argue that random test is not really something you want to rely on, they are great as early indicator.

@Lokathor
Copy link
Contributor Author

Lokathor commented Aug 8, 2019

SO, here's the deal: @gnzlbg told me last night on zulip that apparently I've been using libm wrong this whole time.

  • This PR is based around the idea that you'd have to declare a libm dependency, call the function, and then rustc builds the crate and all that.
  • It turns out that because libm is special you can just declare an extern "C" block with all the functions you want and the functions will be there (in a "non-guaranteed", "not our fault if it breaks" sort of way) because either the platform's C libm will have them or compiler-builtins will use Rust libm to provide them.
  • If you do depend on and then build libm yourself you get worse code in many situations because llvm knows about libm and will often enough optimize your code for you (eg: sqrt(0.0) gets const folded to just 0.0).

BASED ON THIS NEW INFO: I'm going to just close up this particular PR for now, no changes needed. When I get time I'll petition T-libs to just make these functions stable in core since "they're already there" anyway.

@Lokathor Lokathor closed this Aug 8, 2019
@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2019

@Lokathor You can use libm by just adding it as a dependency. There are different ways of using libm that might be better for your use case, depending on what you are doing.

The best way to use these is to use libstd, but since you are kind of building your own, you probably can't do that.

Using the symbols is a possibility, but you need to make sure that the symbols exist, and you probably won't get any LLVM optimizations, like the one you have implemented in this PR. The only "advantage" of using the symbols over libm is that if your target has a C math library, those functions will be called instead, but that's pretty much it.

If you want to get LLVM optimizations like libstd does, but in a #![no_std] library, you either need to use core::intrinsics or link_llvm_intrinsics to call the right LLVM intrinsics, and make sure that the symbols are available. If after optimizations the symbol is called, and that resolves to a libm function, you kind of want those to be optimized properly. In this particular case, for these particular SSE2 targets, LLVM never calls the symbol, but this isn't true in general, so it makes sense to optimize the functions here (it is just unclear whether it is worth it to provide stable Rust code that does that instead of a 1-line of LLVM code when we know that LLVM does the right thing, e.g., like for wasm32).

If you don't care about using the math functions from a C library or about LLVM optimizations, or want to use the exact same math functions for multiple targets independently of what the C library does, or are constrained to stable Rust, or many other use cases, using libm directly might be the best choice.


Relying on the symbols existing is super-brittle, and will probably break if we ever allow any kind of user customization for these symbol names like GCC and Clang allow. Currently, libcore has some tests that rely on some of the symbols existing. (e.g. see here), so if you are considering relying on that, you might want to, first run libcore tests for the targets you care about.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2019

If you don't care about using the math functions from a C library or about LLVM optimizations, or want to use the exact same math functions for multiple targets independently of what the C library does, or are constrained to stable Rust, or many other use cases, using libm directly might be the best choice.

For those use cases, this PR does improve things, so it might be worth it to keep it open or merge it. It is however unclear whether the use cases that would be satisfied by this aren't better satisfied by exposing the math functions from libcore.

@Lokathor
Copy link
Contributor Author

Lokathor commented Aug 8, 2019

Ah, hmm, if it's still an improvement then I'll open it back up.

@Lokathor Lokathor reopened this Aug 8, 2019
@Lokathor
Copy link
Contributor Author

Lokathor commented Aug 9, 2019

Alright,

Ready for re-review and possible merge if it passes review.

@alexcrichton
Copy link
Member

Yes this won't actually ever be used on x86/x86_64 targets that have sse with libstd because 1.0.sqrt() will cause LLVM to do the equivalent of what this is already doing, but @Lokathor you're thinking that you'd still like this to land for any users either explicitly using symbols and/or this library itself? Do you know of users doing that on x86/x86_64?

@Lokathor
Copy link
Contributor Author

Lokathor commented Aug 9, 2019

well, me for one.

More broadly, anyone else who depends on libm as a Stable-rust, core-only source of math operations will need this update.

Particularly, some gamedev-wg folks and I have been trying to improve the state of linear algebra libraries for graphics, and in several benchmarks nalgebra had 10x or worse time taken compared to other common libraries because the std feature flag had been left off the build which caused nalgebra to switch to a pure-software sqrt even on x86/x86_64 where they could have stayed with a stable hardware sqrt given enough conditional compilation. We can do that tricky work on behalf of others and then tell them "just use libm, it does the right thing".

I agree with gnzlbg that the correct long term path is to work on getting more core::intrinsics stabilized, but the PR for that RFC hasn't seen any activity in 2019 at all (reminder link), and there's a rustc tracking issue for it that's had some success but also seems to not be moving along very quickly. Given this, I think it's appropriate to make this improvement here.

@alexcrichton
Copy link
Member

Ok that sounds reasonable to me!

Before we fill this out for other intrinsics I'd prefer to get the macro story and style sorted out, but moving forward with at least this one sounds reasonable to me.

@alexcrichton alexcrichton merged commit 32a6a99 into rust-lang:master Aug 12, 2019
@Lokathor Lokathor removed their assignment Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants