Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

adds mapping from nodes pubkeys to their shred-version #17940

Merged
merged 1 commit into from
Jun 18, 2021

Conversation

behzadnouri
Copy link
Contributor

Problem

Crds values of nodes with different shred versions are creeping into
gossip table resulting in runtime issues as the one addressed in:
#17899

Summary of Changes

This commit works towards enforcing more checks and filtering based on
shred version by adding necessary mapping and api to gossip table.
Once populated, pubkey->shred-version mapping persists as long as there
are any values associated with the pubkey.

Crds values of nodes with different shred versions are creeping into
gossip table resulting in runtime issues as the one addressed in:
solana-labs#17899

This commit works towards enforcing more checks and filtering based on
shred version by adding necessary mapping and api to gossip table.
Once populated, pubkey->shred-version mapping persists as long as there
are any values associated with the pubkey.
@codecov
Copy link

codecov bot commented Jun 14, 2021

Codecov Report

❗ No coverage uploaded for pull request base (master@a248770). Click here to learn what that means.
The diff coverage is 87.5%.

@@            Coverage Diff            @@
##             master   #17940   +/-   ##
=========================================
  Coverage          ?    82.6%           
=========================================
  Files             ?      431           
  Lines             ?   121332           
  Branches          ?        0           
=========================================
  Hits              ?   100270           
  Misses            ?    21062           
  Partials          ?        0           

@behzadnouri behzadnouri requested review from carllin and sakridge June 14, 2021 20:04
@carllin
Copy link
Contributor

carllin commented Jun 15, 2021

Out of curiosity, do we know how these other shred versions are creeping into the gossip table?

It seems like:

  1. In pull, we should only making pull requests to other validators with the same shred version or 0:
    && (self_shred_version == 0 || self_shred_version == v.shred_version)

Could the validators with shred version 0 be cross-pollinating between the clusters?

  1. In push, we should only be pushing to other validators with the same shred version:
    && self_shred_version == info.shred_version

@behzadnouri
Copy link
Contributor Author

Could the validators with shred version 0 be cross-pollinating between the clusters?

Might be. Apparently it has been intentional to gossip with nodes with shred-version 0:
https://github.com/solana-labs/solana/blob/161838655/gossip/src/cluster_info.rs#L1248

@carllin
Copy link
Contributor

carllin commented Jun 15, 2021

Hmm yeah what's more confusing is we already have this filter_by_shred_version() function that runs on both pushes:

Self::filter_by_shred_version(
and pulls:
Self::filter_by_shred_version(

Inside that function:

CrdsData::ContactInfo(contact_info) => contact_info.id == *from,

  1. We deliberately don't filter if shred_version == 0, because we pass in shred version 0 at startup, until we get a pull response from an entrypoint:
    if self.my_shred_version() == 0 {
    if let Some(entrypoint) = entrypoints
    .iter()
    .find(|entrypoint| entrypoint.shred_version != 0)
    {
    info!(
    "Setting shred version to {:?} from entrypoint {:?}",
    entrypoint.shred_version, entrypoint.id
    );
    self.my_contact_info.write().unwrap().shred_version = entrypoint.shred_version;
    self.gossip
    .write()
    .unwrap()
    .set_shred_version(entrypoint.shred_version);
    }
    }
    Seems like these should be filtering out values from nodes with different shred versions after this point? I could see us maybe getting bad pull requests from the wrong cluster while the shred_version == 0, but these should be dropped pretty quickly after we set our shred version (unless those validators are using the same keys on both Tds and MB and are staked on both?)
  2. and we want to support nodes changing their shred versions: which seems contradictory to this PR.

CC: @t-nelson

@carllin carllin requested a review from t-nelson June 15, 2021 22:42
@carllin
Copy link
Contributor

carllin commented Jun 15, 2021

Also, for some context, it seems we agreed that we could kill the special 0 handling in my PR #11620 (comment) to try and remove this very same cross-pollination issue.

But because I have the memory of a goldfish 🐟 , I'm trying to refresh my memory around the context of this issue. Here was also a brief discord discussion: https://discord.com/channels/428295358100013066/478692221441409024/743945363211288616

@behzadnouri
Copy link
Contributor Author

but these should be dropped pretty quickly after we set our shred version (unless those validators are using the same keys on both Tds and MB and are staked on both?)

That is one possibility.
I actually think shred-version was added to support validators using the same keys across clusters.

and we want to support nodes changing their shred versions: which seems contradictory to this PR.

why is it contradictory to this PR?

Also, for some context, it seems we agreed that we could kill the special 0 handling in my PR #11620 (comment) to try and remove this very same cross-pollination issue.

What happens then when starting a node and not yet have a shred-version?

@behzadnouri
Copy link
Contributor Author

Hmm yeah what's more confusing is we already have this filter_by_shred_version() function that runs on both pushes:

We may receive a crds-value with pubkey x from node a. filter_by_shred_version does not check the shred-version of the owner of the crds-value (i.e the node with pubkey x). It only checks the shred-version of the node which is relaying the value (i.e node a):
https://github.com/solana-labs/solana/blob/5cc073420/gossip/src/cluster_info.rs#L2274-L2289

So crds-values with different shred versions can still pass through this function as long as they are relayed by a node with matching shred function. I do not know how that first node obtained the crds-value though!

@carllin
Copy link
Contributor

carllin commented Jun 16, 2021

why is it contradictory to this PR?

Ah I misread the change, I thought once the Pubkey -> shred version mapping was made, we couldn't change it, but it can be modified on insert of a new ContactInfo:
https://github.com/solana-labs/solana/pull/17940/files#diff-5b67451eccdd9787fc6bfecc40a9d00211eda78ac5adb01c46d4fa61598c8268R204-R208

I do not know how that first node obtained the crds-value though

hmmm yeah this is the confusing bit. And it seems from the EpochSlots crds values must be relatively recent or they would've been purged right? So they must be constantly pushed/refreshed to be alive, which implies some node is a bridge, maybe somebody's running a really old spy that hasn't updated its shred version from 0 🤯

We may receive a crds-value with pubkey x from node a. filter_by_shred_version does not check the >> shred-version of the owner of the crds-value (i.e the node with pubkey x)

Is it reasonable to add a check for the origin pubkey in filter_by_shred_version to filter those bad apples out?

@behzadnouri
Copy link
Contributor Author

behzadnouri commented Jun 17, 2021

And it seems from the EpochSlots crds values must be relatively recent or they would've been purged right?

I did check them on TDS, and some are very recent, like with wallclock less than 30 seconds ago.

So they must be constantly pushed/refreshed to be alive, which implies some node is a bridge,

This is true.

One source of leak is that when starting a validator it initially has shred_version = 0, so it may obtain crds values from any node. Until it adopts entrypoint shred-version:
https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417

At which point it already has crds-values from different shred versions in its gossip table, but since it has updated its shred version it can now leak it out to the rest of the cluster for the period of time this loop runs:
https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L397-L559

This is a viable path for the leak, but it is a one time only and we are not starting a validator every minute. So does not explain stream of recent values with wrong shred version.

Is it reasonable to add a check for the origin pubkey in filter_by_shred_version to filter those bad apples out?

Yeah, that was the plan to follow up with adding more checks earlier on packets processing using this mapping. This PR is just adding the mapping.

@behzadnouri behzadnouri merged commit 5a99fa3 into solana-labs:master Jun 18, 2021
@behzadnouri behzadnouri deleted the crds-shred-version branch June 18, 2021 15:56
mergify bot pushed a commit that referenced this pull request Jun 18, 2021
Crds values of nodes with different shred versions are creeping into
gossip table resulting in runtime issues as the one addressed in:
#17899

This commit works towards enforcing more checks and filtering based on
shred version by adding necessary mapping and api to gossip table.
Once populated, pubkey->shred-version mapping persists as long as there
are any values associated with the pubkey.

(cherry picked from commit 5a99fa3)
mergify bot added a commit that referenced this pull request Jun 18, 2021
Crds values of nodes with different shred versions are creeping into
gossip table resulting in runtime issues as the one addressed in:
#17899

This commit works towards enforcing more checks and filtering based on
shred version by adding necessary mapping and api to gossip table.
Once populated, pubkey->shred-version mapping persists as long as there
are any values associated with the pubkey.

(cherry picked from commit 5a99fa3)

Co-authored-by: behzad nouri <[email protected]>
@brooksprumo brooksprumo mentioned this pull request Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants