Consider mitigating the JCC erratum in the JIT (Nov. 2019) #13795

damageboy · 2019-11-14T12:52:03Z

What

In continuation of #13794, It might be worthwhile to consider applying the same fixes suggested by intel in their erratum:
https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf

To the code generated by the JIT.

It is hard for me, personally to evaluate the impact on the performance of generated code produced by the JIT, but my very small anecdotal experience shows anywhere from 1%-2% degradation for my own code, to potential 20% degradation in very specific instances.

This is in accordance with Intel's own guidance (Section 2.2):

Intel has observed performance effects associated with the workaround ranging from
0-4% on many industry-standard benchmarks. 1 In subcomponents of these
benchmarks, Intel has observed outliers higher than the 0-4% range.

Affected opcodes

The update affects all forms of jumps, direct and indirect when the instruction itself is either ending or crossing on a 32-byte boundary (Section 2.1):

The MCU prevents jump instructions from being cached in the Decoded ICache when the jump
instructions cross a 32-byte boundary or when they end on a 32-byte boundary. In
this context, Jump Instructions include all jump types: conditional jump (Jcc), macro-
fused op-Jcc (where op is one of cmp, test, add, sub, and, inc, or dec), direct
unconditional jump, indirect jump, direct/indirect call, and return.

Proposed mitigation for code-generation

The proposed fix by intel is to stuff the affected opcodes:

jcc
fused
jmp
call
ret

with 0x2e prefix (End of section 2.4.0):

The advice to software developers is to align the jae instruction so that it does not
cross a 32-byte boundary. In the example, this is done by adding the benign prefix
0x2e four times before the first push %rbp instruction so that the cmp instruction,
which started at offset 1c, will instead start at offset 20. Hence the macro-fused cmp+jae instruction will not cross a 32-byte boundary.

Affected processors

Section 4.0 specifically lists no less than 40 Intel CPUs, covering the breadth of their product line, that are affected by this update:

Family	Stepping	Series
06_8EH	9	8th Generation Intel® CoreTM Processor Family based on microarchitecture code name Amber Lake Y
06_8EH	C	8th Generation Intel® CoreTM Processor Family based on microarchitecture code name Amber Lake Y
06_55	7	2nd Generation Intel® Xeon® Scalable Processors based on microarchitecture code name Cascade Lake (server)
06_9EH	A	8th Generation Intel® CoreTM Processor Family based on microarchitecture code name Coffee Lake H
06_9EH	A	8th Generation Intel® CoreTM Processor Family based on microarchitecture code name Coffee Lake S
06_8EH	A	8th Generation Intel® CoreTM Processor Family based on microarchitecture code name Coffee Lake U43e
06_9EH	B	8th Generation Intel® CoreTM Processor Family based on microarchitecture code name Coffee Lake S (47)
06_9EH	B	Intel® Celeron® Processor G Series based on microarchitecture code name Coffee Lake S (48)
06_9EH	A	8th Generation Intel® CoreTM Processor Family based on microarchitecture code name Coffee Lake S (69) x/KBP
06_9EH	A	Intel® Xeon® Processor E Family based on microarchitecture code name Coffee Lake S (610)
06_9EH	A	Intel® Xeon® Processor E Family based on microarchitecture code name Coffee Lake S (611)
06_9EH	A	Intel® Xeon® Processor E Family based on microarchitecture code name Coffee Lake S (612)
06_9EH	A	Intel® Xeon® Processor E Family based on microarchitecture code name Coffee Lake S (413)
06_9EH	A	Intel® Xeon® Processor E Family based on microarchitecture code name Coffee Lake S (414)
06_9EH	A	Intel® Xeon® Processor E Family based on microarchitecture code name Coffee Lake S (415)
06_9EH	D	9th Generation Intel® CoreTM Processor Family based on microarchitecture code name Coffee Lake H (816)
06_9EH	D	9th Generation Intel® CoreTM Processor Family based on microarchitecture code name Coffee Lake S (817)
06_8EH	C	10th Generation Intel® CoreTM Processor Family based on microarchitecture code name Comet Lake U42
06_A6H	0	10th Generation Intel® CoreTM Processor Family based on microarchitecture code name Comet Lake U62
06_9EH	9	8th Generation Intel® CoreTM Processor Family based on microarchitecture code name Kaby Lake G
06_9EH	9	7th Generation Intel® CoreTM Processor Family based on microarchitecture code name Kaby Lake H
06_AEH	A	8th Generation Intel® CoreTM Processor Family based on microarchitecture code name Kaby Lake Refresh U (422)
06_9EH	9	7th Generation Intel® CoreTM Processor Family based on microarchitecture code name Kaby Lake S
06_8EH	9	7th Generation Intel® CoreTM Processor Family based on microarchitecture code name Kaby Lake U
06_8EH	9	7th Generation Intel® CoreTM Processor Family based on microarchitecture code name Kaby Lake U23e
06_9EH	9	Intel® CoreTM X-series Processors based on microarchitecture code name Kaby Lake X
06_9EH	9	Intel® Xeon® Processor E3 v6 Family Kaby Lake Xeon E3
06_8EH	9	7th Generation Intel® CoreTM Processor Family based on microarchitecture code name Kaby Lake Y
06_55H	4	Intel® Xeon® Processor D Family based on microarchitecture code name Skylake D Bakerville
06_5E	3	6th Generation Intel® CoreTM Processor Family based on microarchitecture code name Skylake H
06_5E	3	6th Generation Intel® CoreTM Processor Family based on microarchitecture code name Skylake S
06_55H	4	Intel® Xeon® Scalable Processors based on microarchitecture code name Skylake Server
06_4E	3	6th Generation Intel® CoreTM Processors based on microarchitecture code name Skylake U
06_4E	3	6th Generation Intel® CoreTM Processor Family based on microarchitecture code name Skylake U23e
06_55H	4	Intel® Xeon® Processor W Family based on microarchitecture code name Skylake W
06_55H	4	Intel® CoreTM X-series Processors based on microarchitecture code name Skylake X
06_55H	4	Intel® Xeon® Processor E3 v5 Family based on microarchitecture code name Skylake Xeon E3
06_4E	3	6th Generation Intel® CoreTM Processors based on microarchitecture code name Skylake Y
06_8EH	B	8th Generation Intel® CoreTM Processors based on microarchitecture code name Whiskey Lake U
06_8EH	C	8th Generation Intel® CoreTM Processors based on microarchitecture code name Whiskey Lake U

Proposed fix to the JIT

I personally hold the position that this isn't a small issue (even if we take intel's claim of 0%-4% as-is) and that it is not going away: neither in terms of performance impact nor when considering the breadth of the product line affected by this issue.
I think that it would be the desired behavior, from the standpoint of the JIT, to detect if this microcode update is applied or if the CPU model is from the affected family and generate code that would not trigger the degradation.

On the face of it, while AMD processors are completely unaffected by this erratum, they also seem to not suffer adverse effects when running code that was generated by patched up assemblers:

https://www.phoronix.com/scan.php?page=news_item&px=AMD-With-Intel-JCC-Assembler

category:cq
theme:alignment
skill-level:intermediate
cost:medium

BruceForstall · 2019-11-20T20:54:08Z

cc @AndyAyersMS

AndyAyersMS · 2019-11-20T22:13:20Z

Would be helpful to collect more data on the observed perf impact, so we can decide how to prioritize this.

Perhaps extra motivation to 32-byte align the code buffer (see #9912 and #8108) for x86/x64; without doing that, we can't possibly fix this. Could possibly do it just for Tier1 methods to try and minimize the fragmentation.

Implementing a fix for this in the jit may be non-trivial. Adding nops to fix one branch alignment may alter span-dependent instruction lengths and so push other branches into bad alignments. Handling macro fused ops adds a bit of complexity too. Probably needs to be handled as part of branch tensioning.

damageboy · 2019-11-20T22:19:23Z

Would love to run the benchmarks in a couple+ of different machines (broadwell, kaby lake, skylake-x). Is there a specific set of benchmarks that you would like me to use? CC @adamsitnik

damageboy · 2019-12-06T11:18:28Z

@AndyAyersMS @adamsitnik

Seems like someone over the GraalVM project re-ran their benchmark suite, and I think their results pretty much reflect what I've experienced sporadically and that is that Intel is lying through their teeth regarding the 0-5% perf drop (Very impressive and concise visualization of 1000 benchmark results, regardless):

oracle/graal#1829

I think it's clear, given that EVERY result on the screenshot is degraded more than 5% that this is not something that is un-noticeable...
Other compilers and JITs will eventually provide a mitigation for this (I'm assuming that Rust/Go and other LLVM based compilers will have an easier/faster response time to this, on that issue it is clear Graal already has something in the works) and there will be a clear and substantial drop in perceived/actual performance for CoreCLR / .NetFX from this microcode update compared to patched languages/runtimes.

I'd love to get the chance to run a bunch of benchmarks for JIT'd code that are acceptable for the CoreCLR team and provide some results.

I can start looking for large suites of .NET BenchmarkDotNet projects, but I'm assuming there is already something that all of the JIT devs are using as some sort of reference much like the GraalVM people?

filipnavara · 2019-12-06T12:46:10Z

I can start looking for large suites of .NET BenchmarkDotNet projects, but I'm assuming there is already something that all of the JIT devs are using as some sort of reference much like the GraalVM people?

https://github.com/dotnet/performance ... it is run roughly every 4 hours for .NET Core and I run it once a day for Mono.

emaste · 2019-12-06T14:16:31Z

Note that the image quoted above has a slider set to "Ignore deviations below 5%" - perhaps that explains the why every result on the screenshot is degraded more than 5%? I do believe it's very important that this issue gets addressed, but want to make sure we're interpreting the results correctly.

damageboy · 2019-12-06T14:31:07Z

There is a link in the issue to the full analysis: * 14 benchmarks are better (improved by > 5%) * 752 are degraded by more than 5% * 180 unchanged (+/- 5%)

brianrob · 2019-12-06T17:55:51Z

Thanks folks for continuing to engage on this. I've been looking into the performance differences here as well, but don't yet have something worth sharing. I'm aiming for a mix of micro-benchmarks and some that look a bit more realistic. My goal is to understand the general performance cost, as well as which code is responsible for that cost (native vs. managed). From there we can start looking at mitigations.

With respect to the data above, the summary looks very similar to what I'm seeing in the microbenchmarks.

damageboy · 2019-12-30T11:57:52Z

Hi, I was wondering, now that it's been about a month since the last update on this issue and 18 days since @BruceForstall added a 5.0 milestone for this issue if there has been any formal decision
on how/if to address this?

I realize that the 5.0 milestone doesn't necessarily mean that any mitigation will be added at any level (managed/unmanaged code). If no decision has been made yet, I'd like to understand if the research that was performed by @brianrob was concluded, and if so, would it be possible to share the results of the impact that was measured?

brianrob · 2020-01-10T22:29:04Z

@damageboy thanks for reaching out on this. I was out for the holidays and have picked this work back up this week. I'm not done yet, but plan to share methodology and results once it's ready. Also expect that we'll be interested to hear from folks like yourself on how your apps are impacted to help guide a decision making process. All that to say that so far things have seemed fairly closed, but I expect to open it up more soon - I just don't have data that is in a form worth sharing, and that actually helps back up a conclusion just yet. Thanks for your patience.

damageboy · 2020-01-12T07:21:07Z

Sure thing. I will point out, from the get-go, that for us, the only sensible decision was to make sure this microcode update never gets applied.

So in effect, we are experiencing no impact what-so-ever, since we have basically, at this point, given up any security expectations from Intel processors.
At the same time it is important to note that our systems which I am discussing here are not internet facing machines, which completely changes the decision making process in favor for taking such at extreme measure.

I will point out that in our testing the impact was considerable, and while it was definitely far from the worse case I reported with 20% drop in perf for Array.Sort(), we saw a more moderate, 4-5% drop in perf for our computationally intensive workloads on Haswell and Broadwell processors (No Skylake processors in our computational cluster so far).

damageboy · 2020-01-29T11:19:27Z

@brianrob if there are new findings/conclusions about this issue I would be happy if you could share them here...

I personally think it's also completely fine to mark this as wontfix if that is what the group and the
JIT gurus think is the right answer going forward for this.
It's been about two and half months now, holidays and all, and I would really like to get some visibility or finality about where this is going.

Maybe you could share what MSFT is doing internally about this?

We've all heard previously how bing, to name one example, is a big consumer of CoreCLR and contributes and pushes GC and JIT improvement with various fixes/proposals which end up pushing perf up in small marginal improvements?

How are they being affected by this supposed 5%+ perf hit (that is if you take Intel's word as-is)?
Are they seeing a much smaller perf hit?
Are they perhaps also not deploying the fix?

AndyAyersMS · 2020-04-29T01:28:02Z

@brianrob any update on this?

brianrob · 2020-04-29T19:38:35Z

@AndyAyersMS, thanks for pinging this. I am putting together the results to post here soon.

brianrob · 2020-05-01T21:32:24Z

FYI, I just posted #35730 with the results of the investigation.

jkotas assigned brianrob Nov 14, 2019

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

danmoseley mentioned this issue Feb 28, 2020

Consider re-compling CoreCLR's unmanaged code to mitigate JCC erratum (Nov. 2019) #13794

Closed

This was referenced May 1, 2020

JCC Erratum Impact on .NET Core #35730

Open

JCC Erratum Impact on .NET Core dotnet/announcements#153

Open

damageboy closed this as completed May 2, 2020

AndyAyersMS mentioned this issue Oct 28, 2020

[Perf -17%] System.Collections.ContainsKeyFalse<Int32, Int32>.ImmutableDictionary DrewScoggins/performance-2#2281

Open

ghost locked as resolved and limited conversation to collaborators Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider mitigating the JCC erratum in the JIT (Nov. 2019) #13795

Consider mitigating the JCC erratum in the JIT (Nov. 2019) #13795

damageboy commented Nov 14, 2019

BruceForstall commented Nov 20, 2019

AndyAyersMS commented Nov 20, 2019

damageboy commented Nov 20, 2019 via email •

edited

Loading

damageboy commented Dec 6, 2019

filipnavara commented Dec 6, 2019

emaste commented Dec 6, 2019

damageboy commented Dec 6, 2019 via email •

edited

Loading

brianrob commented Dec 6, 2019

damageboy commented Dec 30, 2019

brianrob commented Jan 10, 2020

damageboy commented Jan 12, 2020

damageboy commented Jan 29, 2020

AndyAyersMS commented Apr 29, 2020

brianrob commented Apr 29, 2020

brianrob commented May 1, 2020

Consider mitigating the JCC erratum in the JIT (Nov. 2019) #13795

Consider mitigating the JCC erratum in the JIT (Nov. 2019) #13795

Comments

damageboy commented Nov 14, 2019

What

Affected opcodes

Proposed mitigation for code-generation

Affected processors

Proposed fix to the JIT

BruceForstall commented Nov 20, 2019

AndyAyersMS commented Nov 20, 2019

damageboy commented Nov 20, 2019 via email • edited Loading

damageboy commented Dec 6, 2019

filipnavara commented Dec 6, 2019

emaste commented Dec 6, 2019

damageboy commented Dec 6, 2019 via email • edited Loading

brianrob commented Dec 6, 2019

damageboy commented Dec 30, 2019

brianrob commented Jan 10, 2020

damageboy commented Jan 12, 2020

damageboy commented Jan 29, 2020

AndyAyersMS commented Apr 29, 2020

brianrob commented Apr 29, 2020

brianrob commented May 1, 2020

damageboy commented Nov 20, 2019 via email •

edited

Loading

damageboy commented Dec 6, 2019 via email •

edited

Loading