Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Add voting service #18552

Merged
merged 1 commit into from
Jul 15, 2021
Merged

Add voting service #18552

merged 1 commit into from
Jul 15, 2021

Conversation

sakridge
Copy link
Contributor

@sakridge sakridge commented Jul 9, 2021

Problem

Voting is slow and blocks replay of forks.

Summary of Changes

Push voting to another thread

Fixes #

@codecov
Copy link

codecov bot commented Jul 9, 2021

Codecov Report

Merging #18552 (8ea508a) into master (a4a24b6) will increase coverage by 0.0%.
The diff coverage is 96.7%.

@@           Coverage Diff           @@
##           master   #18552   +/-   ##
=======================================
  Coverage    82.7%    82.7%           
=======================================
  Files         436      437    +1     
  Lines      123512   123549   +37     
=======================================
+ Hits       102163   102194   +31     
- Misses      21349    21355    +6     

@sakridge sakridge requested review from carllin and behzadnouri July 9, 2021 17:07
Copy link
Contributor

@behzadnouri behzadnouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 paths that are writing votes to crds table:

  1. ClusterInfo::push_vote:
    https://github.com/solana-labs/solana/blob/761de8b1a/gossip/src/cluster_info.rs#L1047-L1092
  2. ClusterInfo::refresh_vote:
    https://github.com/solana-labs/solana/blob/761de8b1a/gossip/src/cluster_info.rs#L1094-L1126

Both are computing the vote_index under a gossip read lock, releasing the lock and then writing the votes to crds table under a 2nd write lock. So technically they have a race condition with each other where both may compute the same vote-index and overwrite each other.

Last time I checked both paths are called from the same thread, and so to my knowledge we are not hitting that race condition. But this commit is adding a new thread, so making the race condition a runtime possibility.

One possibility is just to compute the vote-index under an exclusive lock and hold on to that lock until the vote is written to the crds table. There are only 32 lookups to compute vote-index, so maybe it would not be that bad.

But, can we please hold on this approach until I can finish some tests with gossip locking and see where those go? Adding a new thread adds its own subtle complexities like the race condition above. I would rather use this separate thread approach as a last resort.

@sakridge
Copy link
Contributor Author

sakridge commented Jul 9, 2021

There are 2 paths that are writing votes to crds table:

  1. ClusterInfo::push_vote:
    https://github.com/solana-labs/solana/blob/761de8b1a/gossip/src/cluster_info.rs#L1047-L1092
  2. ClusterInfo::refresh_vote:
    https://github.com/solana-labs/solana/blob/761de8b1a/gossip/src/cluster_info.rs#L1094-L1126

Both are computing the vote_index under a gossip read lock, releasing the lock and then writing the votes to crds table under a 2nd write lock. So technically they have a race condition with each other where both may compute the same vote-index and overwrite each other.

Last time I checked both paths are called from the same thread, and so to my knowledge we are not hitting that race condition. But this commit is adding a new thread, so making the race condition a runtime possibility.

One possibility is just to compute the vote-index under an exclusive lock and hold on to that lock until the vote is written to the crds table. There are only 32 lookups to compute vote-index, so maybe it would not be that bad.

But, can we please hold on this approach until I can finish some tests with gossip locking and see where those go? Adding a new thread adds its own subtle complexities like the race condition above. I would rather use this separate thread approach as a last resort.

Yea, let’s see the results of the locking optimization.

couldn’t we just use the same thread to do both the writes? They would be ordered by the channel, right?

behzadnouri
behzadnouri previously approved these changes Jul 12, 2021
Copy link
Contributor

@behzadnouri behzadnouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM aside from the fact that it is adding a new thread, and I would prefer exhaust all optimaztions before adding new threads. Your call to merge this now or wait until gossip lock reworks are complete, and see if this is still needed.

@mergify mergify bot dismissed behzadnouri’s stale review July 14, 2021 12:49

Pull request has been modified.

@sakridge sakridge merged commit 0f8bcf6 into solana-labs:master Jul 15, 2021
@sakridge sakridge deleted the voting-service branch July 15, 2021 14:36
@sakridge sakridge added the v1.7 label Jul 15, 2021
@behzadnouri behzadnouri linked an issue Jul 15, 2021 that may be closed by this pull request
@sakridge sakridge added v1.7 and removed v1.7 labels Jul 16, 2021
mergify bot pushed a commit that referenced this pull request Jul 16, 2021
(cherry picked from commit 0f8bcf6)

# Conflicts:
#	core/src/replay_stage.rs
#	core/src/tvu.rs
sakridge added a commit that referenced this pull request Jul 16, 2021
sakridge added a commit that referenced this pull request Jul 16, 2021
sakridge added a commit that referenced this pull request Jul 16, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jul 18, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jul 18, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jul 19, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jul 20, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jul 20, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jul 20, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jul 20, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jul 20, 2021
sakridge added a commit to sakridge/solana that referenced this pull request Jul 20, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jul 20, 2021
sakridge added a commit to sakridge/solana that referenced this pull request Jul 22, 2021
sakridge added a commit that referenced this pull request Jul 22, 2021
This was referenced Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slow push_vote in ReplayStage
2 participants