-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Cost Calculations for Shuffle Ports on Icelake-client / Icelake-server #47454
Comments
llvm-mca and the reset of llvm is using the skylake-server scheduling information for icelake-client and server. |
This isn't an llvm-mca issue, it just shows the problem - as Craig said icelake doesn't have its own scheduler models. Ideally we need someone who has the CPUs that can fork the skylake models, adjust them based off llvm-exegesis reports and the intel hw docs, and then confirm that it actually improves performance. |
https://reviews.llvm.org/D108914 I've put up a candidate patch that creates a Icelake model as a direct copy of the existing SkylakeServer model - this would then allow iterative improvements to address microarch differences such as this. |
This landed at 9e2d14c |
https://reviews.llvm.org/D115547 fixes some of the integer shuffles |
…uilder SelectionDAGBuilder was inconsistently mangling values based on ABI Calling Conventions when getting them through copyFromRegs in SelectionDAGBuilder, causing duplicate value type convertions for function arguments. The checking for the mangling requirement was based on the value's originating instruction and was performed outside of, and inspite of, the regular Calling Convention Lowering. The issue could be observed in a scenario such as: ``` %arg1 = load half, half* %const, align 2 %arg2 = call fastcc half @someFunc() call fastcc void @otherFunc(half %arg1, half %arg2) ; Here, %arg2 was incorrectly mangled twice, as the CallConv data from ; the call to @someFunc() was taken into consideration for the check ; when getting the value for processing the call to @otherFunc(...), ; after the proper convertion had taken place when lowering the return ; value of the first call. ``` This patch fixes the issue by disregarding the Calling Convention information for such copyFromRegs, making sure the ABI mangling is properly contanined in the Calling Convention Lowering. This fixes Bugzilla llvm#47454. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87844 (cherry picked from commit 53d238a)
https://reviews.llvm.org/D115752 fixes some of the fp shuffles |
…uilder SelectionDAGBuilder was inconsistently mangling values based on ABI Calling Conventions when getting them through copyFromRegs in SelectionDAGBuilder, causing duplicate value type convertions for function arguments. The checking for the mangling requirement was based on the value's originating instruction and was performed outside of, and inspite of, the regular Calling Convention Lowering. The issue could be observed in a scenario such as: ``` %arg1 = load half, half* %const, align 2 %arg2 = call fastcc half @someFunc() call fastcc void @otherFunc(half %arg1, half %arg2) ; Here, %arg2 was incorrectly mangled twice, as the CallConv data from ; the call to @someFunc() was taken into consideration for the check ; when getting the value for processing the call to @otherFunc(...), ; after the proper convertion had taken place when lowering the return ; value of the first call. ``` This patch fixes the issue by disregarding the Calling Convention information for such copyFromRegs, making sure the ABI mangling is properly contanined in the Calling Convention Lowering. This fixes Bugzilla llvm#47454. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87844
Extended Description
Hi there,
It looks like llvm-mca is treating icelake-client and icelake-server as having only one shuffle port. This causes incorrect cost calculations for any operation that would use said shuffle ports. A block diagram showing the architecture can be found here: https://en.wikichip.org/wiki/intel/microarchitectures/sunny_cove (sunny cove is the core used by both icelake-server and icelake-client). Similarly, you can find the uops usage for any given shuffle here: https://uops.info/table.html?search=shuf&cb_lat=on&cb_tp=on&cb_uops=on&cb_ports=on&cb_SKL=on&cb_ICL=on&cb_measurements=on&cb_base=on&cb_avx=on
If the code generating the costs for LLVM-MCA is used by anything else in the toolchain, it's likely this will yield a performance benefit for other users targeting icelake-server and icelake-client.
The text was updated successfully, but these errors were encountered: