eth: improve shutdown synchronization #20695

fjl · 2020-02-19T15:50:34Z

Most goroutines started by eth.Ethereum didn't have any shutdown sync at
all, which lead to weird error messages when quitting the client.

This change improves the clean shutdown path by stopping all internal
components in dependency order and waiting for them to actually be
stopped before shutdown is considered done. In particular, we now stop
everything related to peers before stopping 'resident' parts such as
core.BlockChain.

Fixes #18495

eth/sync.go

Most goroutines started by eth.Ethereum didn't have any shutdown sync at all, which lead to weird error messages when quitting the client. This change improves the clean shutdown path by stopping all internal components in dependency order and waiting for them to actually be stopped before shutdown is considered done. In particular, we now stop everything related to peers before stopping 'resident' parts such as core.BlockChain.

fjl · 2020-03-24T13:34:12Z

@karalabe PTAL

fjl · 2020-03-24T13:51:22Z

I have now rewritten the chain sync controller as a proper state machine. The new behavior is as follows:

Whenever a new peer connects or a new head is announced by a connected peer, the chainSyncer is notified and rechecks the preconditions for starting sync. If a sync is already running, we don't do anything. Otherwise we check if there are at least 5 peers or the 10 second 'force timeout' has passed. The sync is launched if any peer has a better TD than our local TD.

fjl · 2020-03-24T14:09:04Z

Force timeout handling could be improved. One issue with the new system is that it's too correct. The old code used a ticker, so retry times were about 5s on average. The new code starts the timer when sync stops. If there are too few peers (i.e. less than 5 with maxpeers >= 5), this means we'll now always wait 10s before retrying.

On the upside, the retry is immediate when we have enough peers:

TRACE[03-24|15:09:59.866] Starting chain sync                      mode=full peercount=64 id=687997932b991706
WARN [03-24|15:10:53.918] Synchronisation failed, dropping peer    peer=687997932b991706 err=timeout
TRACE[03-24|15:10:53.919] Starting chain sync                      mode=full peercount=43 id=36b274d045aba872

eth/backend.go

eth/sync.go

karalabe · 2020-03-26T13:38:31Z

eth/sync.go

+	pm          *ProtocolManager
+	force       *time.Timer
+	forced      bool // true when force timer fired
+	peerEventCh chan struct{}


This peerEventCh seems to just be a boring notification channel that something happened and we should recheck. There's no information content in it, and if there are 3 changes simultaneously, there's no need to handle each separately.

My suggestion would be to turn this into a buffered notification channel on 1 and whenever sending into it, if it's already notified, just return early. Wouldn't that make more sense?

karalabe · 2020-03-26T13:38:58Z

eth/sync.go

+func newChainSyncer(pm *ProtocolManager) *chainSyncer {
+	return &chainSyncer{
+		pm:          pm,
+		peerEventCh: make(chan struct{}),


If this is a notification channel, we should make it buffered with 1. There's no real reason for us to block on it, is there?

karalabe · 2020-03-26T13:41:59Z

eth/sync.go

+		return true
+	case <-cs.pm.quitSync:
+		return false
+	}


I'd extend this select to handle peerEventCh as a notification:

select { case cs.peerEventCh <- struct{}{}: return true default: select { case <-cs.pm.quitSync: return false default: return true } }

Something like this?

* eth: improve shutdown synchronization Most goroutines started by eth.Ethereum didn't have any shutdown sync at all, which lead to weird error messages when quitting the client. This change improves the clean shutdown path by stopping all internal components in dependency order and waiting for them to actually be stopped before shutdown is considered done. In particular, we now stop everything related to peers before stopping 'resident' parts such as core.BlockChain. * eth: rewrite sync controller * eth: remove sync start debug message * eth: notify chainSyncer about new peers after handshake * eth: move downloader.Cancel call into chainSyncer * eth: make post-sync block broadcast synchronous * eth: add comments * core: change blockchain stop message * eth: change closeBloomHandler channel type

fjl requested review from holiman, karalabe and rjl493456442 as code owners February 19, 2020 15:50

karalabe reviewed Feb 27, 2020

View reviewed changes

eth/sync.go Show resolved Hide resolved

karalabe reviewed Feb 27, 2020

View reviewed changes

eth/sync.go Outdated Show resolved Hide resolved

fjl force-pushed the eth-shutdown-sync branch from 4251a1a to 210150a Compare March 24, 2020 13:16

fjl added 2 commits March 24, 2020 14:22

eth: rewrite sync controller

a499518

fjl force-pushed the eth-shutdown-sync branch from 210150a to a499518 Compare March 24, 2020 13:33

fjl added 4 commits March 25, 2020 17:36

eth: remove sync start debug message

b556602

eth: notify chainSyncer about new peers after handshake

c3d032d

eth: move downloader.Cancel call into chainSyncer

15fd802

eth: make post-sync block broadcast synchronous

68bc898

karalabe reviewed Mar 26, 2020

View reviewed changes

eth/backend.go Outdated Show resolved Hide resolved

karalabe reviewed Mar 26, 2020

View reviewed changes

eth/sync.go Show resolved Hide resolved

karalabe reviewed Mar 26, 2020

View reviewed changes

eth/sync.go Show resolved Hide resolved

karalabe reviewed Mar 26, 2020

View reviewed changes

fjl added 3 commits March 26, 2020 15:07

eth: add comments

59acca5

core: change blockchain stop message

b5c7169

eth: change closeBloomHandler channel type

cdd1c3d

karalabe added this to the 1.9.13 milestone Mar 27, 2020

karalabe merged commit d6c5f24 into ethereum:master Mar 27, 2020

holiman mentioned this pull request Apr 2, 2020

Transaction background indexing/unindexing + improved chain initiation from freezer #20302

Merged

fjl mentioned this pull request Apr 8, 2020

eth, les: wait for the downloads to be complete before stopping #16200

Closed

karalabe mentioned this pull request Apr 24, 2020

eth hangs on shutdown if blocks are importing #20975

Closed

meowsbits mentioned this pull request May 20, 2020

Panic: Segmentation Violation, doesn't shutdown gracefully etclabscore/core-geth#101

Closed

aidan-kwon mentioned this pull request Jun 8, 2020

Reorganize the stop sequence of modules klaytn/klaytn#535

Merged

9 tasks

ricardolyn mentioned this pull request Feb 4, 2021

[Upgrade] Go-Ethereum release v1.9.13 Consensys/quorum#1123

Merged

15 tasks

jeongkyun-oh mentioned this pull request Feb 3, 2022

Downloader termination enhancement klaytn/klaytn#1144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eth: improve shutdown synchronization #20695

eth: improve shutdown synchronization #20695

fjl commented Feb 19, 2020 •

edited

Loading

fjl commented Mar 24, 2020

fjl commented Mar 24, 2020

fjl commented Mar 24, 2020 •

edited

Loading

karalabe Mar 26, 2020

karalabe Mar 26, 2020

karalabe Mar 26, 2020

eth: improve shutdown synchronization #20695

eth: improve shutdown synchronization #20695

Conversation

fjl commented Feb 19, 2020 • edited Loading

fjl commented Mar 24, 2020

fjl commented Mar 24, 2020

fjl commented Mar 24, 2020 • edited Loading

karalabe Mar 26, 2020

Choose a reason for hiding this comment

karalabe Mar 26, 2020

Choose a reason for hiding this comment

karalabe Mar 26, 2020

Choose a reason for hiding this comment

fjl commented Feb 19, 2020 •

edited

Loading

fjl commented Mar 24, 2020 •

edited

Loading