-
Notifications
You must be signed in to change notification settings - Fork 20.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth: improve shutdown synchronization #20695
Conversation
Most goroutines started by eth.Ethereum didn't have any shutdown sync at all, which lead to weird error messages when quitting the client. This change improves the clean shutdown path by stopping all internal components in dependency order and waiting for them to actually be stopped before shutdown is considered done. In particular, we now stop everything related to peers before stopping 'resident' parts such as core.BlockChain.
@karalabe PTAL |
I have now rewritten the chain sync controller as a proper state machine. The new behavior is as follows: Whenever a new peer connects or a new head is announced by a connected peer, the chainSyncer is notified and rechecks the preconditions for starting sync. If a sync is already running, we don't do anything. Otherwise we check if there are at least 5 peers or the 10 second 'force timeout' has passed. The sync is launched if any peer has a better TD than our local TD. |
Force timeout handling could be improved. One issue with the new system is that it's too correct. The old code used a ticker, so retry times were about 5s on average. The new code starts the timer when sync stops. If there are too few peers (i.e. less than 5 with maxpeers >= 5), this means we'll now always wait 10s before retrying. On the upside, the retry is immediate when we have enough peers:
|
pm *ProtocolManager | ||
force *time.Timer | ||
forced bool // true when force timer fired | ||
peerEventCh chan struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This peerEventCh
seems to just be a boring notification channel that something happened and we should recheck. There's no information content in it, and if there are 3 changes simultaneously, there's no need to handle each separately.
My suggestion would be to turn this into a buffered notification channel on 1 and whenever sending into it, if it's already notified, just return early. Wouldn't that make more sense?
func newChainSyncer(pm *ProtocolManager) *chainSyncer { | ||
return &chainSyncer{ | ||
pm: pm, | ||
peerEventCh: make(chan struct{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a notification channel, we should make it buffered with 1. There's no real reason for us to block on it, is there?
return true | ||
case <-cs.pm.quitSync: | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd extend this select to handle peerEventCh as a notification:
select {
case cs.peerEventCh <- struct{}{}:
return true
default:
select {
case <-cs.pm.quitSync:
return false
default:
return true
}
}
Something like this?
* eth: improve shutdown synchronization Most goroutines started by eth.Ethereum didn't have any shutdown sync at all, which lead to weird error messages when quitting the client. This change improves the clean shutdown path by stopping all internal components in dependency order and waiting for them to actually be stopped before shutdown is considered done. In particular, we now stop everything related to peers before stopping 'resident' parts such as core.BlockChain. * eth: rewrite sync controller * eth: remove sync start debug message * eth: notify chainSyncer about new peers after handshake * eth: move downloader.Cancel call into chainSyncer * eth: make post-sync block broadcast synchronous * eth: add comments * core: change blockchain stop message * eth: change closeBloomHandler channel type
Most goroutines started by eth.Ethereum didn't have any shutdown sync at
all, which lead to weird error messages when quitting the client.
This change improves the clean shutdown path by stopping all internal
components in dependency order and waiting for them to actually be
stopped before shutdown is considered done. In particular, we now stop
everything related to peers before stopping 'resident' parts such as
core.BlockChain.
Fixes #18495