GLC: Mixed up ports in UOPS_DISPATCHED.PORT_X event #149

JanLJL · 2024-03-06T10:27:33Z

I believe there is a mistake in the documentation of the incore events of SPR, specifically UOPS_DISPATCHED.PORT_2_3_10 and UOPS_DISPATCHED.PORT_5_11 are mixed up and the first one should count the events of dispatched uops on ports 2, 3, and 11 while the latter should count the dispatched uops on ports 5 and 10.

Based on the Intel Architectures Optimization Reference, we can see on page 62/63 that port 10 (p10) adds a simple integer ALU while port 11 (p11) is used for loading data and address generation.
Seeing in the documentation that apparently there is an event counting the load uops on p2 and p3, but not the load uops on p11 and rather the uops of a port used for integer arithmetic made me doubt.

So I added hardware performance counters (using likwid) to a simple benchmark code measuring an ADD on 32-bit general purpose registers, such as add r9d, r10d, where I am sure it should run on all ALU ports, i.e., p0, p1, p5, p6, and p10.

I counted the dispatched uops and - as a metric - print out the ratio of the overall dispatched uops to get a percentage number and 100 for port 0 would mean, all dispatched uops were dispatched on p0.

Instructions per loop: (32 add + 1 inc + 1 cmp + 1 jl) = 35 instructions (apparently there is no macro fusion happening because of the jl)
We do 1,000,000 iterations --> 35,000,000,000 uops

+----------------------------------+---------+------------+
|               Event              | Counter | HWThread 0 |
+----------------------------------+---------+------------+
|         INSTR_RETIRED_ANY        |  FIXC0  |   35006880 |
|       CPU_CLK_UNHALTED_CORE      |  FIXC1  |    7049300 |
|       CPU_CLK_UNHALTED_REF       |  FIXC2  |    7048320 |
|    UOPS_DISPATCHED_PORT_PORT_0   |   PMC0  |    6675973 |
|    UOPS_DISPATCHED_PORT_PORT_1   |   PMC1  |    6719374 |
| UOPS_DISPATCHED_PORT_PORT_2_3_10 |   PMC2  |       3076 |
|   UOPS_DISPATCHED_PORT_PORT_4_9  |   PMC3  |       1405 |
|  UOPS_DISPATCHED_PORT_PORT_5_11  |   PMC4  |   13607280 |
|    UOPS_DISPATCHED_PORT_PORT_6   |   PMC5  |    7005328 |
|   UOPS_DISPATCHED_PORT_PORT_7_8  |   PMC6  |       1345 |
+----------------------------------+---------+------------+

+------------------------+------------+
|         Metric         | HWThread 0 |
+------------------------+------------+
|   Runtime (RDTSC) [s]  |     0.0035 |
|  Runtime unhalted [s]  |     0.0035 |
|       Clock [MHz]      |  2000.2744 |
|           CPI          |     0.2014 |
|          Port0         |        100 |
|    Port 0 occupation   |    19.6273 |
|    Port 1 occupation   |    19.7549 |
| Port 2/3/10 occupation |     0.0090 |
|   Port 4/9 occupation  |     0.0041 |
|  Port 5/11 occupation  |    40.0052 |
|    Port 6 occupation   |    20.5956 |
|   Port 7/8 occupation  |     0.0040 |
+------------------------+------------+

We can see that p0, p1, and p6 are occupied 20% of the time, while p5/11 shows 40% occupancy.
Since p11 is used for loads and we are not loading any data in the benchmark, this either means
a) p10 is not used at all - even though it has an ALU - and that the instruction is scheduled twice as many times on p5, or
b) p10 and p11 should be actually swapped and each of the five ALU ports gets 20% of the dispatched uops, which I think is the case and makes more sense.

Could you please confirm this and, if verified, change the documentation accordingly?

Thanks and best,
Jan

The text was updated successfully, but these errors were encountered:

edwarddavidbaker · 2024-03-06T16:06:26Z

@JanLJL Thank you for filing a very detailed issue!
@vdaneti Please review the above notes and compare to SPR checkout data.

edwarddavidbaker · 2024-05-06T16:52:35Z

@vdaneti Did you receive documentation feedback on GLC ports 10 and 11?

https://cdrdv2.intel.com/v1/dl/getContent/671488 - Figure 2-2 and Table 2-3

JanLJL · 2024-11-06T12:12:44Z

Any updates on this one?

vdaneti · 2024-11-06T16:42:32Z

@edwarddavidbaker please reference the updated arch doc here

edwarddavidbaker · 2024-11-06T17:09:49Z

Re-assigning to myself as a reminder to link v51 of the Optimization Reference Manual when it is posted.

edwarddavidbaker · 2024-11-19T19:08:54Z

@boomanaiden154 Thanks for opening a ticket and linking the LLVM issue. We are determining the best method to implement documentation updates. I apologize for the delays.

boomanaiden154 · 2024-11-19T20:40:43Z

We are determining the best method to implement documentation updates. I apologize for the delays.

All good on the timing. Everything is stable on our end, if a bit inconsistent. Given the plan is to update the documentation, it seems like the resolution was that perfmon was correct and the diagrams in the optimization manual need to have ports 10 and 11 swapped?

edwarddavidbaker · 2024-11-22T15:23:16Z

We are determining the best method to implement documentation updates. I apologize for the delays.

All good on the timing. Everything is stable on our end, if a bit inconsistent. Given the plan is to update the documentation, it seems like the resolution was that perfmon was correct and the diagrams in the optimization manual need to have ports 10 and 11 swapped?

Correct. Ports 10 and 11 need to be swapped in documentation for Golden Cove.

Based on intel/perfmon#149, the documentation is incorrect and the pfm counter names are actually correct. This patch adjusts the SapphireRapids scheduling model to match the performance counter naming/ correct naming that will soon be reflected in the optimization manual. This fixes part of llvm#117360.

Based on intel/perfmon#149, the documentation is incorrect and the pfm counter names are actually correct. This patch adjusts the Alder Lake scheduling model to match the performance counter naming/ correct naming that will soon be reflected in the optimization manual. This fixes part of llvm#117360.

Based on intel/perfmon#149, the documentation is incorrect and the pfm counter names are actually correct. This patch adjusts the SapphireRapids scheduling model to match the performance counter naming/ correct naming that will soon be reflected in the optimization manual. This fixes part of #117360.

Based on intel/perfmon#149, the documentation is incorrect and the pfm counter names are actually correct. This patch adjusts the Alder Lake scheduling model to match the performance counter naming/ correct naming that will soon be reflected in the optimization manual. This fixes part of #117360.

HaohaiWen · 2024-11-25T05:26:58Z

Another mistake is Intel GoldenCove instruction tpt/lat in: https://www.intel.com/content/www/us/en/content-details/723498/intel-processors-and-processor-cores-based-on-golden-cove-microarchitecture-instruction-throughput-and-latency.html
port 11 and 10 also need to be swappwd.

edwarddavidbaker assigned vdaneti Mar 6, 2024

edwarddavidbaker changed the title ~~Mixed up ports in UOPS_DISPATCHED.PORT_X event~~ GLC: Mixed up ports in UOPS_DISPATCHED.PORT_X event May 6, 2024

vdaneti closed this as completed Nov 6, 2024

edwarddavidbaker assigned edwarddavidbaker and unassigned vdaneti Nov 6, 2024

edwarddavidbaker reopened this Nov 6, 2024

boomanaiden154 mentioned this issue Nov 11, 2024

Golden Cove+ uarches swap ports 10 and 11 in counter names #239

Closed

boomanaiden154 mentioned this issue Nov 22, 2024

Swap ports 10 and 11 in Golden Cove Scheduler Models llvm/llvm-project#117360

Closed

boomanaiden154 mentioned this issue Nov 24, 2024

[X86] Swap ports 10 and 11 in Alder Lake Scheduling Model llvm/llvm-project#117466

Merged

boomanaiden154 mentioned this issue Nov 24, 2024

[X86] Swap ports 10 and 11 in SapphireRapids Scheduling Model llvm/llvm-project#117468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLC: Mixed up ports in UOPS_DISPATCHED.PORT_X event #149

GLC: Mixed up ports in UOPS_DISPATCHED.PORT_X event #149

JanLJL commented Mar 6, 2024

edwarddavidbaker commented Mar 6, 2024

edwarddavidbaker commented May 6, 2024

JanLJL commented Nov 6, 2024

vdaneti commented Nov 6, 2024

edwarddavidbaker commented Nov 6, 2024

edwarddavidbaker commented Nov 19, 2024

boomanaiden154 commented Nov 19, 2024

edwarddavidbaker commented Nov 22, 2024

HaohaiWen commented Nov 25, 2024

GLC: Mixed up ports in UOPS_DISPATCHED.PORT_X event #149

GLC: Mixed up ports in UOPS_DISPATCHED.PORT_X event #149

Comments

JanLJL commented Mar 6, 2024

edwarddavidbaker commented Mar 6, 2024

edwarddavidbaker commented May 6, 2024

JanLJL commented Nov 6, 2024

vdaneti commented Nov 6, 2024

edwarddavidbaker commented Nov 6, 2024

edwarddavidbaker commented Nov 19, 2024

boomanaiden154 commented Nov 19, 2024

edwarddavidbaker commented Nov 22, 2024

HaohaiWen commented Nov 25, 2024