-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Exclude partition reward accounts in account hash calculation #34809
Conversation
I have restarted my test node with this PR against mainnet. Currently, we are at 10% of epoch 561.
We will see if this fix works in 2 days. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #34809 +/- ##
========================================
Coverage 81.7% 81.7%
========================================
Files 826 825 -1
Lines 223371 223302 -69
========================================
+ Hits 182578 182629 +51
+ Misses 40793 40673 -120 |
8f14ef4
to
242f79f
Compare
Test node with this PR runs successfully across the epoch boundary!
|
Does this bug impact the partitioned epoch rewards feature gate itself, or only the |
No, it desn't. Only when |
Verify that this PR works1. Running
log from 2. Running ledger-tool with the PR on
Furthermore, the log shows that epoch_reward partition account is created as expected!
|
Is this an issue because with the CLI flag we create the PDA account, thus producing a different bank hash because the rest of the cluster does not have this account? |
Yeah. Exactly. |
Great! Then yes, this idea looks safe to me. We already do the same thing for the sysvar accounts, so also ignoring another account makes sense. |
will capitalization still match with the bank? does this sysvar have any lamports? |
part 2. When is the sysvar deleted? We had to work on the earlier cli pr in order to make sure we deleted the sysvar that has the rewards themselves (as lamports). Since the cli arg does it all in one spot, the rewards sysvar gets created, drained, then deleted. |
Good question. No, capitalization will be mismatching. We need to fix that too. The new accounts are not sysvar accounts. One account is created for each epoch. And they won't be deleted. Instead, they will stay forever. |
Also, turning the CLI on/off affects epoch_account_hash too. I have been running a node that toggles this CLI between epoch and it seems to result in epoch_account_hash mismatches. We have to fix epoch account hash too, i.e. (exclude the PDA accounts). #34876 has more details. |
a384e53
to
d76a5a3
Compare
I restarted
We will know at 564.75 epoch, now at 563.62 epoch, ~ 1.13 epoch away. |
if self.addresses.is_empty() { | ||
self.addresses.insert(sysvar::epoch_rewards::id()); | ||
|
||
for e in 0..=epoch { | ||
let pda = get_epoch_rewards_partition_data_address(e); | ||
self.addresses.insert(pda); | ||
} | ||
self.max_epoch = epoch; | ||
} else if epoch > self.max_epoch { | ||
for e in self.max_epoch + 1..=epoch { | ||
let pda = get_epoch_rewards_partition_data_address(e); | ||
self.addresses.insert(pda); | ||
} | ||
self.max_epoch = epoch; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of duplication in the if
and else
branches; can that be factored out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. refactored!
// ignore reward PDA account lamport when we are testing partitioned rewards before the actual feature activation. | ||
if !self.is_partitioned_rewards_feature_enabled() | ||
&& (self | ||
.partitioned_epoch_rewards_config() | ||
.test_enable_partitioned_rewards | ||
&& self.get_reward_calculation_num_blocks() == 0 | ||
&& self.partitioned_rewards_stake_account_stores_per_block() == u64::MAX) | ||
{ | ||
self.capitalization | ||
.fetch_sub(new_account.lamports(), Relaxed); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of selectively ignoring the PDA and subtracting the lamports, can we just not store the PDA account when testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an option, and probably the simplest option. However, this won't allow us to test the new reward rpc methods.
New reward rpc method require that those accounts exist in the account-db. The new rpc method, which will read these accounts and use them to extract stake rewards from the corresponding reward blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If --partitioned-epoch-rewards-force-enable-single-slot
is supplied, is it even possible to exercise the new RPC method? Or is the full feature gate required?
Maybe this is something that impacts tests but not running a node? I'm not sure, just trying to think through all the combinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. If the new RPC method is feature gated, then we won't be able to run the method without feature activation.
But I think the RPC method will have no access to the feature gate. Instead, it relies on whether the PDA account exists or not to determine which code path to run. If the PDA account exists, then it will follow the code path that extracts rewards from partitioned reward blocks. Otherwise, it will follow the old code path to extract rewards from the first block in the epoch.
@CriesofCarrots may confirm this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, it's not a new RPC method, just an adaptation of the existing getInflationReward
. And yes, it checks whether the feature gate has been enabled to decide which block(s) to pull to find rewards. https://github.com/solana-labs/solana/pull/34773/files#diff-b55d9bcb5d2e7eae09fd352edcc765f9c2a4c05b97e9847582071d35850e8671R607
It will not know anything about the --partitioned-epoch-rewards-force-enable-single-slot
flag. Since that flag puts all rewards in the first slot anyway, the RPC method should still work. But it's not a particularly useful test case for the RPC changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
It looks like we are unlikely to use --partitioned-epoch-rewards-force-enable-single-slot
as a way to test the getInflationReward
rpc. This PR seems to be too "hacky" anyway.
I am closing this PR and create a new PR to get --partitioned-epoch-rewards-force-enable-single-slot
working by simply not storing the new PDA account when the flag is active.
I'm holding off review until churn is done, and CI is green. Feel free to give me a nudge if it's at that point |
My test node passed the epoch account hash for epoch 564. The fix works! |
Problem
We can run the validator against mainnet with the new partitioned reward code by enabling
--partitioned-epoch-rewards-force-enable-single-slot
CLI argument.Recently, my test node running with the above CLI fails on account hash mismatch. It turns out that the mismatch is coming from the new partition reward account introduced at the epoch, i.e. #34624.
In order for this CLI continue to work after #34624, we need to ignore the epoch partition reward accounts for account hash calculation.
Summary of Changes
Ignore partition reward accounts for account's hash calculation when we force enable partitioned rewards
Fixes #