Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth: improve shutdown synchronization #20695

Merged
merged 9 commits into from
Mar 27, 2020
Merged

Conversation

fjl
Copy link
Contributor

@fjl fjl commented Feb 19, 2020

Most goroutines started by eth.Ethereum didn't have any shutdown sync at
all, which lead to weird error messages when quitting the client.

This change improves the clean shutdown path by stopping all internal
components in dependency order and waiting for them to actually be
stopped before shutdown is considered done. In particular, we now stop
everything related to peers before stopping 'resident' parts such as
core.BlockChain.

Fixes #18495

@fjl fjl force-pushed the eth-shutdown-sync branch from 4251a1a to 210150a Compare March 24, 2020 13:16
fjl added 2 commits March 24, 2020 14:22
Most goroutines started by eth.Ethereum didn't have any shutdown sync at
all, which lead to weird error messages when quitting the client.

This change improves the clean shutdown path by stopping all internal
components in dependency order and waiting for them to actually be
stopped before shutdown is considered done. In particular, we now stop
everything related to peers before stopping 'resident' parts such as
core.BlockChain.
@fjl fjl force-pushed the eth-shutdown-sync branch from 210150a to a499518 Compare March 24, 2020 13:33
@fjl
Copy link
Contributor Author

fjl commented Mar 24, 2020

@karalabe PTAL

@fjl
Copy link
Contributor Author

fjl commented Mar 24, 2020

I have now rewritten the chain sync controller as a proper state machine. The new behavior is as follows:

Whenever a new peer connects or a new head is announced by a connected peer, the chainSyncer is notified and rechecks the preconditions for starting sync. If a sync is already running, we don't do anything. Otherwise we check if there are at least 5 peers or the 10 second 'force timeout' has passed. The sync is launched if any peer has a better TD than our local TD.

@fjl
Copy link
Contributor Author

fjl commented Mar 24, 2020

Force timeout handling could be improved. One issue with the new system is that it's too correct. The old code used a ticker, so retry times were about 5s on average. The new code starts the timer when sync stops. If there are too few peers (i.e. less than 5 with maxpeers >= 5), this means we'll now always wait 10s before retrying.

On the upside, the retry is immediate when we have enough peers:

TRACE[03-24|15:09:59.866] Starting chain sync                      mode=full peercount=64 id=687997932b991706
WARN [03-24|15:10:53.918] Synchronisation failed, dropping peer    peer=687997932b991706 err=timeout
TRACE[03-24|15:10:53.919] Starting chain sync                      mode=full peercount=43 id=36b274d045aba872

pm *ProtocolManager
force *time.Timer
forced bool // true when force timer fired
peerEventCh chan struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This peerEventCh seems to just be a boring notification channel that something happened and we should recheck. There's no information content in it, and if there are 3 changes simultaneously, there's no need to handle each separately.

My suggestion would be to turn this into a buffered notification channel on 1 and whenever sending into it, if it's already notified, just return early. Wouldn't that make more sense?

func newChainSyncer(pm *ProtocolManager) *chainSyncer {
return &chainSyncer{
pm: pm,
peerEventCh: make(chan struct{}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a notification channel, we should make it buffered with 1. There's no real reason for us to block on it, is there?

return true
case <-cs.pm.quitSync:
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd extend this select to handle peerEventCh as a notification:

select {
case cs.peerEventCh <- struct{}{}:
	return true
default:
	select {
	case <-cs.pm.quitSync:
		return false
	default:
		return true
	}
}

Something like this?

enriquefynn pushed a commit to enriquefynn/go-ethereum that referenced this pull request Mar 10, 2021
* eth: improve shutdown synchronization

Most goroutines started by eth.Ethereum didn't have any shutdown sync at
all, which lead to weird error messages when quitting the client.

This change improves the clean shutdown path by stopping all internal
components in dependency order and waiting for them to actually be
stopped before shutdown is considered done. In particular, we now stop
everything related to peers before stopping 'resident' parts such as
core.BlockChain.

* eth: rewrite sync controller

* eth: remove sync start debug message

* eth: notify chainSyncer about new peers after handshake

* eth: move downloader.Cancel call into chainSyncer

* eth: make post-sync block broadcast synchronous

* eth: add comments

* core: change blockchain stop message

* eth: change closeBloomHandler channel type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to store fast sync trie progress on shutdown
2 participants