Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

discards epoch-slots epochs ahead of the current root #19256

Merged
merged 2 commits into from
Aug 17, 2021

Conversation

behzadnouri
Copy link
Contributor

Problem

Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
#17789 (comment)

Summary of Changes

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.

@@ -40,6 +44,8 @@ impl ClusterSlots {
}

fn update_internal(&self, root: Slot, epoch_slots_list: Vec<EpochSlots>) {
// Discard slots at or before current root or epochs ahead.
const SLOTS_DIFF_RANGE: Range<u64> = 1..2 * DEFAULT_SLOTS_PER_EPOCH;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we get this from the runtime value: bank.get_slots_in_epoch()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.
added an extra .max(DEFAULT_SLOTS_PER_EPOCH) just to be a bit more conservative in discarding epoch-slots.

Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
solana-labs#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
solana-labs#17789 (comment)

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.
@@ -54,13 +58,20 @@ impl ClusterSlots {
})
.collect()
};
// Discard slots at or before current root or epochs ahead.
let slot_range = (root + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm I think we need some buffer for the slots before the root for nodes that are catching up. There's also a use case currently for handling duplicate slots where nodes that have gotten the wrong version of a block rely on seeing EpochSlots reach a certain percentage of the threshold before initiating the duplicate repair protocol. cc @AshwinSekar

Is it feasible to keep around at least the latest epoch of EpochSlots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a separate change independent of this commit.
Current master code is already discarding epoch slots at or before current root:
https://github.com/solana-labs/solana/blob/692aa9914/core/src/cluster_slots.rs#L61-L63
https://github.com/solana-labs/solana/blob/692aa9914/core/src/cluster_slots.rs#L82

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh shoot this is the internal update for other nodes' epoch slots wer're receiving, but we're still persisting our own updates that are earlier than the root. Sorry for the confusion!

@codecov
Copy link

codecov bot commented Aug 16, 2021

Codecov Report

Merging #19256 (c252be9) into master (692aa99) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #19256   +/-   ##
=======================================
  Coverage    82.8%    82.8%           
=======================================
  Files         455      455           
  Lines      130044   130049    +5     
=======================================
+ Hits       107780   107800   +20     
+ Misses      22264    22249   -15     

@behzadnouri behzadnouri merged commit 563aec0 into solana-labs:master Aug 17, 2021
@behzadnouri behzadnouri deleted the cluster-slots-epoch branch August 17, 2021 13:13
mergify bot pushed a commit that referenced this pull request Aug 19, 2021
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
#17789 (comment)

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.

(cherry picked from commit 563aec0)
mergify bot added a commit that referenced this pull request Aug 19, 2021
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
#17789 (comment)

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.

(cherry picked from commit 563aec0)

Co-authored-by: behzad nouri <[email protected]>
@brooksprumo brooksprumo mentioned this pull request Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants