-
Notifications
You must be signed in to change notification settings - Fork 4.7k
adds mapping from nodes pubkeys to their shred-version #17940
adds mapping from nodes pubkeys to their shred-version #17940
Conversation
Crds values of nodes with different shred versions are creeping into gossip table resulting in runtime issues as the one addressed in: solana-labs#17899 This commit works towards enforcing more checks and filtering based on shred version by adding necessary mapping and api to gossip table. Once populated, pubkey->shred-version mapping persists as long as there are any values associated with the pubkey.
Codecov Report
@@ Coverage Diff @@
## master #17940 +/- ##
=========================================
Coverage ? 82.6%
=========================================
Files ? 431
Lines ? 121332
Branches ? 0
=========================================
Hits ? 100270
Misses ? 21062
Partials ? 0 |
Out of curiosity, do we know how these other shred versions are creeping into the gossip table? It seems like:
Could the validators with shred version 0 be cross-pollinating between the clusters?
|
Might be. Apparently it has been intentional to gossip with nodes with shred-version 0: |
Hmm yeah what's more confusing is we already have this solana/gossip/src/cluster_info.rs Line 2365 in 9a5330b
solana/gossip/src/cluster_info.rs Line 2203 in 9a5330b
Inside that function: solana/gossip/src/cluster_info.rs Line 2269 in 9a5330b
CC: @t-nelson |
Also, for some context, it seems we agreed that we could kill the special But because I have the memory of a goldfish 🐟 , I'm trying to refresh my memory around the context of this issue. Here was also a brief discord discussion: https://discord.com/channels/428295358100013066/478692221441409024/743945363211288616 |
That is one possibility.
why is it contradictory to this PR?
What happens then when starting a node and not yet have a shred-version? |
We may receive a crds-value with pubkey So crds-values with different shred versions can still pass through this function as long as they are relayed by a node with matching shred function. I do not know how that first node obtained the crds-value though! |
Ah I misread the change, I thought once the Pubkey -> shred version mapping was made, we couldn't change it, but it can be modified on insert of a new ContactInfo:
hmmm yeah this is the confusing bit. And it seems from the EpochSlots crds values must be relatively recent or they would've been purged right? So they must be constantly pushed/refreshed to be alive, which implies some node is a bridge, maybe somebody's running a really old spy that hasn't updated its shred version from 0 🤯
Is it reasonable to add a check for the origin pubkey in |
I did check them on TDS, and some are very recent, like with wallclock less than 30 seconds ago.
This is true. One source of leak is that when starting a validator it initially has At which point it already has crds-values from different shred versions in its gossip table, but since it has updated its shred version it can now leak it out to the rest of the cluster for the period of time this loop runs: This is a viable path for the leak, but it is a one time only and we are not starting a validator every minute. So does not explain stream of recent values with wrong shred version.
Yeah, that was the plan to follow up with adding more checks earlier on packets processing using this mapping. This PR is just adding the mapping. |
Crds values of nodes with different shred versions are creeping into gossip table resulting in runtime issues as the one addressed in: #17899 This commit works towards enforcing more checks and filtering based on shred version by adding necessary mapping and api to gossip table. Once populated, pubkey->shred-version mapping persists as long as there are any values associated with the pubkey. (cherry picked from commit 5a99fa3)
Crds values of nodes with different shred versions are creeping into gossip table resulting in runtime issues as the one addressed in: #17899 This commit works towards enforcing more checks and filtering based on shred version by adding necessary mapping and api to gossip table. Once populated, pubkey->shred-version mapping persists as long as there are any values associated with the pubkey. (cherry picked from commit 5a99fa3) Co-authored-by: behzad nouri <[email protected]>
Problem
Crds values of nodes with different shred versions are creeping into
gossip table resulting in runtime issues as the one addressed in:
#17899
Summary of Changes
This commit works towards enforcing more checks and filtering based on
shred version by adding necessary mapping and api to gossip table.
Once populated, pubkey->shred-version mapping persists as long as there
are any values associated with the pubkey.