Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RyuJIT] Provide access to the CPU prefetch instruction #5869

Closed
redknightlois opened this issue May 17, 2016 · 6 comments
Closed

[RyuJIT] Provide access to the CPU prefetch instruction #5869

redknightlois opened this issue May 17, 2016 · 6 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@redknightlois
Copy link

Many (if not all) current CPUs support prefetching in one way or another. Also the good thing is that the prefetch instruction can be omitted completely if the target architecture do no have such opcode (or the JIT does not support it) as it is an empty method.

The reason to suggest such support is that in many places (like in binary search on B+Trees) the search is actually memory bound, so you eventually hit a ceiling. It doesnt matter how good your JIT code is, the lack of prefetching makes improvements imposible. BTW we have reached that place since a few months ago.

With the introduction of System.Runtime.CompilerServices.Unsafe it actually makes a lot of sense to include a method like:

public enum PrefetchHint
{
   All, // or Temporal or L0
   L1,
   L2,
   NonTemporal // or NTA
}

public static class Unsafe
{
    [JitIntrinsic] 
    public unsafe static void Prefetch(void* p, size length, PrefetchHint hint = PrefetchHint.All);
}

@CarolEidt @mellinoe

@redknightlois
Copy link
Author

This is another example where allowing the JIT to issue a hardware prefetch instruction would be very useful. 30% of the performance of my method throughput goes down the drain because given the size of the 'left out' code the hw prefetcher is not able to pickup the pattern properly.

image

The same thing happens at the L2 level. Luckily I work with 4Kb pages, so only a few outstanding L3 misses.

The code that generates is a very tight loop (but with some control flow embeeded internally):

for (long i = 0; i < len; i += 4, originalPtr += 4, modifiedPtr += 4)
{
    long m0 = modifiedPtr[0];
    long o0 = originalPtr[0];

    long m1 = modifiedPtr[1];
    long o1 = originalPtr[1];

    long m2 = modifiedPtr[2];
    long o2 = originalPtr[2];

    long m3 = modifiedPtr[3];
    long o3 = originalPtr[3];

    ................ // Rest of the code goes here. 
}

@tannergooding
Copy link
Member

This is being implemented as part of the Hardware Intrinsics work.

Sse.Prefetch0, Sse.Prefetch1, Sse.Prefetch2, and Sse.PrefetchNonTemporal will all be available when the feature ships.

@tannergooding
Copy link
Member

@sdmaclea might be able to comment as to whether or not ARM will be exposing similar APIs.

There might still be a remaining work item to expose a general-purpose API that wraps the various platform-specific hardware intrinsics.

If such an API were to be exposed it, at a minimum, would need to specify that the call may do nothing (which is already what the x86 Prefetch instructions specify) and that the prefetch size may be smaller or larger than what is requested.

@redknightlois
Copy link
Author

Released on preview. Should we close?

@tannergooding
Copy link
Member

@redknightlois, feel free to close if you feel that the HWIntrinsic work will sufficiently cover this. Otherwise, I would suggest you update the original post to specify that you believe a "general-purpose" API is needed/desirable.

@redknightlois
Copy link
Author

Hardware Intrinsics API covers exactly this. Have used it with high success.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

3 participants