Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix non-inlined fn calls for some arm vfma intrinsics #1214

Closed
wants to merge 2 commits into from

Conversation

hkratz
Copy link
Contributor

@hkratz hkratz commented Sep 9, 2021

Some VFMA functions have target_feature(enable = "fp-armv8,v8") while the called functions vdup_n_f32 and vdupq_n_f32 are target_feature(enable = "v7"). LLVM does not inline the functions due to the different feature flags. Using private _v8 variants of those functions causes them to be inlined.

@rust-highfive
Copy link

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

#[cfg_attr(target_arch = "arm", target_feature(enable = "fp-armv8,v8"))]
#[cfg_attr(all(test, target_arch = "arm"), assert_instr("vdup.32"))]
#[cfg_attr(all(test, target_arch = "aarch64"), assert_instr(dup))]
unsafe fn vdupq_n_f32_v8(value: f32) -> float32x4_t {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining why this is needed? Also this should probably be under #[cfg(target_arch = "arm")].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amanieu Will do, but first I will fix arm fused multiply-add to only require vfp4 instead of v8 (PR forthcoming) and then the fn name will change to vdupq_n_f32_vfp4.

@hkratz
Copy link
Contributor Author

hkratz commented Sep 19, 2021

Superseded by #1219.

@hkratz hkratz closed this Sep 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants