-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to compile directly to llvm intrinsics #97
Comments
We can use unstable core intrinsics (I'd prefer a |
Well yes, that is true, but by implementing it via intrinsics would make this compatible with all boards, not just this one. And inline assembly isn't optimizable via LLVM, so things like compile time evaluation wouldn't happen. This is my current solution in a project: pub trait F32Ext {
/// Sinus
fn sin(self) -> Self;
/// Cosinus
fn cos(self) -> Self;
}
#[no_mangle]
extern "C" fn sinf(val: f32) -> f32 {
micromath::F32(val).sin().0
}
#[no_mangle]
extern "C" fn cosf(val: f32) -> f32 {
micromath::F32(val).cos().0
}
impl F32Ext for f32 {
fn sin(self) -> f32 {
unsafe { core::intrinsics::sinf32(self) }
}
fn cos(self) -> f32 {
unsafe { core::intrinsics::cosf32(self) }
}
} I thought surely I'm not the only person who came across this issue, so I thought maybe it's worth including something like this in a library :) But if you think it's too much effort to maintain, I'll just keep it in my local project. |
Shouldn't the optimizations work on all boards with an
While that's a minor drawback, it's one I think is outweighed by |
I might have misworded it, I mean it's compatible with all targets then, not just
And if we add a description to the flag the we give zero stability guarantees? The rest of the project (if the feature isn't enabled) should of course stay compatible with |
Of course if you say it's too much of a hastle, I might release it as a separate package. |
I have This crate is low maintenance enough, however, that I guess I'd be OK with it, so long as nightly were pinned and we don't get frequent requests to bump the nightly version and cut releases due to breakages like I experience with other projects. |
I started with doing some benchmarks, and it seems like only (on Teensy 4.0)
There would still be the advantages of being compile time optimizable and being more accurate. But it seems |
@tarcieri Should I create a PR with those benchmarks? It only includes the 1-parameter functions, so far; and I probably won't add more since I will no longer use intrinsics for myself now that I know how little the gain is. Either way, here they are in case you are curious: https://github.com/Finomnis/micromath/tree/teensy40_benchmark/examples/benchmark_teensy40 |
Yes sure, that'd be great |
@tarcieri Btw, it seems that some of the |
Will close this for now as I won't continue working on it. |
Yeah, that's an interesting data point, although this library optimizes for code size over performance. I can investigate if there are any implementations which are both faster and relatively similar in terms of code size and potentially adopt those. |
Many targets, like
thumbv7em-none-eabihf
have a hardware floating point unit which supports operations likef32::abs()
orf32::sqrt()
. Using those would improve performance and reduce code footprint.Sadly, in
no-std
, those are currently only obtainable through the unstablecore::intrinsics
API. A discussion about adding those tostable
exists, bit is stale for quite a while now. Still, for maximum performance, it would be worth it to add anunstable
feature flag (orexperimental_intrinsics
or similar) which compiles directly to intrinsics.This would work the following way. LLVM intrinsics can be compiled in three different ways by LLVM, depending on the circumstances. For example
sqrtf32()
:thumbv7em-none-eabihf
)sqrtf32
as a fallback if no hardware support exists. This function has to be provided by us, viaextern "C"
and#[no_mangle]
.Advantages:
Disadvantages:
If this sparks interest from the crate developer, I might start an implementation and open a PR. If not, I'm fine with having this issue closed.
The text was updated successfully, but these errors were encountered: