-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RyuJIT] Provide access to the CPU prefetch instruction #5869
Comments
This is another example where allowing the JIT to issue a hardware prefetch instruction would be very useful. 30% of the performance of my method throughput goes down the drain because given the size of the 'left out' code the hw prefetcher is not able to pickup the pattern properly. The same thing happens at the L2 level. Luckily I work with 4Kb pages, so only a few outstanding L3 misses. The code that generates is a very tight loop (but with some control flow embeeded internally): for (long i = 0; i < len; i += 4, originalPtr += 4, modifiedPtr += 4)
{
long m0 = modifiedPtr[0];
long o0 = originalPtr[0];
long m1 = modifiedPtr[1];
long o1 = originalPtr[1];
long m2 = modifiedPtr[2];
long o2 = originalPtr[2];
long m3 = modifiedPtr[3];
long o3 = originalPtr[3];
................ // Rest of the code goes here.
} |
This is being implemented as part of the Hardware Intrinsics work.
|
@sdmaclea might be able to comment as to whether or not ARM will be exposing similar APIs. There might still be a remaining work item to expose a general-purpose API that wraps the various platform-specific hardware intrinsics. If such an API were to be exposed it, at a minimum, would need to specify that the call may do nothing (which is already what the x86 Prefetch instructions specify) and that the prefetch size may be smaller or larger than what is requested. |
Released on preview. Should we close? |
@redknightlois, feel free to close if you feel that the HWIntrinsic work will sufficiently cover this. Otherwise, I would suggest you update the original post to specify that you believe a "general-purpose" API is needed/desirable. |
Hardware Intrinsics API covers exactly this. Have used it with high success. |
Many (if not all) current CPUs support prefetching in one way or another. Also the good thing is that the prefetch instruction can be omitted completely if the target architecture do no have such opcode (or the JIT does not support it) as it is an empty method.
The reason to suggest such support is that in many places (like in binary search on B+Trees) the search is actually memory bound, so you eventually hit a ceiling. It doesnt matter how good your JIT code is, the lack of prefetching makes improvements imposible. BTW we have reached that place since a few months ago.
With the introduction of
System.Runtime.CompilerServices.Unsafe
it actually makes a lot of sense to include a method like:@CarolEidt @mellinoe
The text was updated successfully, but these errors were encountered: