-
Notifications
You must be signed in to change notification settings - Fork 4.7k
discards epoch-slots epochs ahead of the current root #19256
discards epoch-slots epochs ahead of the current root #19256
Conversation
1efff82
to
6039148
Compare
core/src/cluster_slots.rs
Outdated
@@ -40,6 +44,8 @@ impl ClusterSlots { | |||
} | |||
|
|||
fn update_internal(&self, root: Slot, epoch_slots_list: Vec<EpochSlots>) { | |||
// Discard slots at or before current root or epochs ahead. | |||
const SLOTS_DIFF_RANGE: Range<u64> = 1..2 * DEFAULT_SLOTS_PER_EPOCH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we get this from the runtime value: bank.get_slots_in_epoch()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
added an extra .max(DEFAULT_SLOTS_PER_EPOCH)
just to be a bit more conservative in discarding epoch-slots.
Cross cluster gossip contamination is causing cluster-slots hash map to contain a lot of bogus values and consume too much memory: solana-labs#17789 If a node is using the same identity key across clusters, then these erroneous values might not be filtered out by shred-versions check, because one of the variants of the contact-info will have matching shred-version: solana-labs#17789 (comment) The cluster-slots hash-map is bounded and trimmed at the lower end by the current root. This commit also discards slots epochs ahead of the root.
6039148
to
c252be9
Compare
@@ -54,13 +58,20 @@ impl ClusterSlots { | |||
}) | |||
.collect() | |||
}; | |||
// Discard slots at or before current root or epochs ahead. | |||
let slot_range = (root + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm I think we need some buffer for the slots before the root for nodes that are catching up. There's also a use case currently for handling duplicate slots where nodes that have gotten the wrong version of a block rely on seeing EpochSlots
reach a certain percentage of the threshold before initiating the duplicate repair protocol. cc @AshwinSekar
Is it feasible to keep around at least the latest epoch of EpochSlots
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a separate change independent of this commit.
Current master code is already discarding epoch slots at or before current root:
https://github.com/solana-labs/solana/blob/692aa9914/core/src/cluster_slots.rs#L61-L63
https://github.com/solana-labs/solana/blob/692aa9914/core/src/cluster_slots.rs#L82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh shoot this is the internal update for other nodes' epoch slots wer're receiving, but we're still persisting our own updates that are earlier than the root. Sorry for the confusion!
Codecov Report
@@ Coverage Diff @@
## master #19256 +/- ##
=======================================
Coverage 82.8% 82.8%
=======================================
Files 455 455
Lines 130044 130049 +5
=======================================
+ Hits 107780 107800 +20
+ Misses 22264 22249 -15 |
Cross cluster gossip contamination is causing cluster-slots hash map to contain a lot of bogus values and consume too much memory: #17789 If a node is using the same identity key across clusters, then these erroneous values might not be filtered out by shred-versions check, because one of the variants of the contact-info will have matching shred-version: #17789 (comment) The cluster-slots hash-map is bounded and trimmed at the lower end by the current root. This commit also discards slots epochs ahead of the root. (cherry picked from commit 563aec0)
Cross cluster gossip contamination is causing cluster-slots hash map to contain a lot of bogus values and consume too much memory: #17789 If a node is using the same identity key across clusters, then these erroneous values might not be filtered out by shred-versions check, because one of the variants of the contact-info will have matching shred-version: #17789 (comment) The cluster-slots hash-map is bounded and trimmed at the lower end by the current root. This commit also discards slots epochs ahead of the root. (cherry picked from commit 563aec0) Co-authored-by: behzad nouri <[email protected]>
Problem
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
#17789
If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
#17789 (comment)
Summary of Changes
The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.