Add option to compile directly to llvm intrinsics #97

Finomnis · 2023-05-31T10:25:29Z

Many targets, like thumbv7em-none-eabihf have a hardware floating point unit which supports operations like f32::abs() or f32::sqrt(). Using those would improve performance and reduce code footprint.

Sadly, in no-std, those are currently only obtainable through the unstable core::intrinsics API. A discussion about adding those to stable exists, bit is stale for quite a while now. Still, for maximum performance, it would be worth it to add an unstable feature flag (or experimental_intrinsics or similar) which compiles directly to intrinsics.

This would work the following way. LLVM intrinsics can be compiled in three different ways by LLVM, depending on the circumstances. For example sqrtf32():

to a constant, if compile time evaluatable
to a hardware instruction if supported (like thumbv7em-none-eabihf)
to the function call sqrtf32 as a fallback if no hardware support exists. This function has to be provided by us, via extern "C" and #[no_mangle].

Advantages:

Out-of-the-box compile time evaluation, if possible (built into llvm)
Much smaller footprint
Much faster performance

Disadvantages:

Depending on the target, the provided fallback functions might collide with the std library. To be determined. That's why I would definitely keep this behind a non-default feature gate.
Requires a nightly compiler. This is another reason why I would keep it behind a feature gate.

If this sparks interest from the crate developer, I might start an implementation and open a PR. If not, I'm fine with having this issue closed.

The text was updated successfully, but these errors were encountered:

tarcieri · 2023-05-31T13:18:19Z

We can use unstable core intrinsics (I'd prefer a nightly feature for that, though keeping up with nightly changes is always a bit painful), but if the goal is primarily optimizing thumbv7em-none-eabihf, it seems like inline assembly would be a stable alternative?

Finomnis · 2023-05-31T13:33:34Z

if the goal is primarily optimizing thumbv7em-none-eabihf, it seems like inline assembly would be a stable alternative?

Well yes, that is true, but by implementing it via intrinsics would make this compatible with all boards, not just this one. And inline assembly isn't optimizable via LLVM, so things like compile time evaluation wouldn't happen.

This is my current solution in a project:

pub trait F32Ext {
    /// Sinus
    fn sin(self) -> Self;
    /// Cosinus
    fn cos(self) -> Self;
}

#[no_mangle]
extern "C" fn sinf(val: f32) -> f32 {
    micromath::F32(val).sin().0
}

#[no_mangle]
extern "C" fn cosf(val: f32) -> f32 {
    micromath::F32(val).cos().0
}

impl F32Ext for f32 {
    fn sin(self) -> f32 {
        unsafe { core::intrinsics::sinf32(self) }
    }

    fn cos(self) -> f32 {
        unsafe { core::intrinsics::cosf32(self) }
    }
}

I thought surely I'm not the only person who came across this issue, so I thought maybe it's worth including something like this in a library :)

But if you think it's too much effort to maintain, I'll just keep it in my local project.

tarcieri · 2023-05-31T13:36:30Z

Well yes, that is true, but by implementing it via intrinsics would make this compatible with all boards, not just this one

Shouldn't the optimizations work on all boards with an thumbv7em-none-eabihf MCU? I don't see why the assembly would vary by board.

And inline assembly isn't optimizable via LLVM, so things like compile time evaluation wouldn't happen.

While that's a minor drawback, it's one I think is outweighed by nightly vs stable. nightly is a big maintenance burden because it's a moving target, which means things may break from nightly-release-to-release.

Finomnis · 2023-05-31T13:39:50Z

Shouldn't the optimizations work on all boards with an thumbv7em-none-eabihf MCU? I don't see why the assembly would vary by board.

I might have misworded it, I mean it's compatible with all targets then, not just thumbv7em-none-eabihf. Basically every other target would then also automatically choose either a hardware implementation or micromath as a fallback, instead of using the micromath implementation by default. Even other micromath functions that depend on each other would profit if some of the calls were hardware accelerated.

While that's a minor drawback, it's one I think is outweighed by nightly vs stable. nightly is a big maintenance burden because it's a moving target, which means things may break from nightly-release-to-release.

And if we add a description to the flag the we give zero stability guarantees? The rest of the project (if the feature isn't enabled) should of course stay compatible with stable.

Finomnis · 2023-05-31T13:42:20Z

Of course if you say it's too much of a hastle, I might release it as a separate package.

tarcieri · 2023-05-31T13:47:16Z

I have nightly features on other projects and it can be quite annoying. Even if it's pinned there can be sporadic breakages (e.g. ICE)

This crate is low maintenance enough, however, that I guess I'd be OK with it, so long as nightly were pinned and we don't get frequent requests to bump the nightly version and cut releases due to breakages like I experience with other projects.

Finomnis · 2023-05-31T21:36:11Z

I started with doing some benchmarks, and it seems like only abs and sqrt are available on thumbv7em-none-eabihf. And sadly, it makes barely a difference.

(on Teensy 4.0)

===== Micromath Benchmark =====
Git Version: 9a7492b-modified

All values in ns/iter.

            micromath       libm intrinsics
abs              16.6       16.6       16.7
acos            178.0      120.3
asin            110.1      139.6
atan             85.0       78.5
atan_norm        76.8
ceil             48.4       58.1
cos             105.0     1668.0
exp             233.5      116.8
floor            38.5       61.4
fract            21.6
inv              16.6
invsqrt          16.6
ln              161.8      157.4
log2            166.9      177.6
log10           166.8      174.2
round            36.6
sin             110.0     1707.4
sqrt             53.4      491.8       38.4
tan             160.1     2560.6
trunc            26.3       45.0

There would still be the advantages of being compile time optimizable and being more accurate. But it seems micromath is really optimized already. Kudos.

Finomnis · 2023-05-31T21:53:50Z

@tarcieri Should I create a PR with those benchmarks? It only includes the 1-parameter functions, so far; and I probably won't add more since I will no longer use intrinsics for myself now that I know how little the gain is.

Either way, here they are in case you are curious: https://github.com/Finomnis/micromath/tree/teensy40_benchmark/examples/benchmark_teensy40

tarcieri · 2023-05-31T22:02:50Z

Yes sure, that'd be great

Finomnis · 2023-06-01T06:24:10Z

@tarcieri #98

Finomnis · 2023-06-01T06:27:54Z

@tarcieri Btw, it seems that some of the libm implementations are faster than ours, at least on the Teensy 4.0. Might be worth investigating. The biggest difference seems to be acos. Although I'm only running it repeatedly on one single value, I'm not sure how input dependent the time is.

Finomnis · 2023-06-01T15:02:46Z

Will close this for now as I won't continue working on it.

tarcieri · 2023-06-01T15:10:51Z

it seems that some of the libm implementations are faster than ours

Yeah, that's an interesting data point, although this library optimizes for code size over performance.

I can investigate if there are any implementations which are both faster and relatively similar in terms of code size and potentially adopt those.

Finomnis closed this as completed Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to compile directly to llvm intrinsics #97

Add option to compile directly to llvm intrinsics #97

Finomnis commented May 31, 2023 •

edited

Loading

tarcieri commented May 31, 2023

Finomnis commented May 31, 2023 •

edited

Loading

tarcieri commented May 31, 2023

Finomnis commented May 31, 2023

Finomnis commented May 31, 2023

tarcieri commented May 31, 2023

Finomnis commented May 31, 2023 •

edited

Loading

Finomnis commented May 31, 2023 •

edited

Loading

tarcieri commented May 31, 2023

Finomnis commented Jun 1, 2023

Finomnis commented Jun 1, 2023 •

edited

Loading

Finomnis commented Jun 1, 2023

tarcieri commented Jun 1, 2023

Add option to compile directly to llvm intrinsics #97

Add option to compile directly to llvm intrinsics #97

Comments

Finomnis commented May 31, 2023 • edited Loading

tarcieri commented May 31, 2023

Finomnis commented May 31, 2023 • edited Loading

tarcieri commented May 31, 2023

Finomnis commented May 31, 2023

Finomnis commented May 31, 2023

tarcieri commented May 31, 2023

Finomnis commented May 31, 2023 • edited Loading

Finomnis commented May 31, 2023 • edited Loading

tarcieri commented May 31, 2023

Finomnis commented Jun 1, 2023

Finomnis commented Jun 1, 2023 • edited Loading

Finomnis commented Jun 1, 2023

tarcieri commented Jun 1, 2023

Finomnis commented May 31, 2023 •

edited

Loading

Finomnis commented May 31, 2023 •

edited

Loading

Finomnis commented May 31, 2023 •

edited

Loading

Finomnis commented May 31, 2023 •

edited

Loading

Finomnis commented Jun 1, 2023 •

edited

Loading