-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] AMD family 25 model 7 wrong regwidth #662
Comments
Can you provide links to the official docs for these chips? Please attach the output of /proc/cpuinfo of one CPU core per system so we can see how to differentiate the different types of Zen4. |
Ok, this is super weird. The official document says it is a 64 bit register (page 253): But I added a function to perfmon.c so that I can see the raw values in likwidMetric.go: You can see a wrap on thread 6 at 32 bit:
I am clueless how to this can happen. |
It might be a documentation issue. Unfortunately, there is no other setting visible in the Linux kernel sources. |
I found the issue. statusRegWidth is correctly set to 64 for our ZEN4_EPYC. The underlying issue is in likwid.h, lines 1694-1698, the typedef for I'm not sure if it should generally be set to uint64_t or if you'd prefer using an ifdef (if that's even possible, as I'm not very familiar with C). |
Good find. It shouldn't be a big deal to update the |
We use AMD EPYC 9254 and 9454 in our cluster (both are Family 25, Model 15, Zen4). In power.c, for all ZEN4_EPYC processors the code sets
power_info.statusRegWidth = 64;
However, at least these two models only have a register width of 32. As a result, we observe a counter wrap every few hours, and the wrap value calculated with 64 bits yields incorrect power results.
I’m not sure how to best distinguish these processors in the code. A patch to correctly set the status register width to 32 for these models would be appreciated.
The text was updated successfully, but these errors were encountered: