-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to validate a candidate: PrepareError panic in cranelift codegen #743
Comments
This validator was issuing quite a few disputes, we can see two reasons from the logs:
|
The first error is really weird: Prepare, means during artifact preparation. So how can there be an |
ambiguous worker death is also kind of unexpected, with 64GiG of RAM, OOM seems an unlikely reason. |
I assume the backtrace points to that unwrap:
Following the links: |
I saw this but didn't have any ideas, I'd have to sit down and do a deep dive. Edit: it's a panic error, which most likely happened during the compilation itself.
Is the "retry on AWD" change in this version? I noticed in the code that we don't log when we retry, but that would be nice to have. |
Do you want me to perform some tests maybe? |
Do you use ECC memory in your setup? |
reg.to_real_reg() ... some configuration mismatch in cranelift for that particular processor? In general, from my perspective this is most likely either a cranelift bug or some weird hardware fault. The former being more realistic. @ancibanci what version of Polkadot are you running? And, yes rebuilding or using a differently provided binary and restarting might shed some more light. |
From the logs: From the specs it looks like https://www.hetzner.com/dedicated-rootserver/ax41-nvme/ probably with Non-ECC RAM, is that correct? From the previous reports it does feel like either corrupted storage or memory, but I'm hesitant to say that just because I don't have any other explanation. |
Using non-ECC memory to run server software is a risky choice, as errors are likely to occur eventually due to the inherent nature of non-ECC memory. |
After downloading binary and restarting, I started getting again the errors "buffer is full"
7 hours ago I got "bad assignment from peer"
And my node got stuck at the block 16055683, and couldn't produce any blocks. It was again chilled very soon after that.
|
The logs themselves tell very little unfortunately. None of the logs after the restart should result in chilling. Could it be that the validator was already chilled? |
Signed-off-by: koushiro <[email protected]>
Hi,
I am running a validator on Kusama and I got some weir error in last few weeks which were appearing occasionally (three times in last 3 weeks).
My configuration:
AMD Ryzen 5 3600, 6c/12t. 3.60GHz, 480 GB NVME, 64 GB, 1 Gbps
Then today I got different error which resulted in my node being chilled.
The text was updated successfully, but these errors were encountered: