Improved removal of dead network channels #3993

pwojcikdev · 2022-11-09T16:00:21Z

This PR improves eviction of channels when their underlying socket becomes dead (due to graceful shutdown or network error). Before we relied only on timeout to purge those, which was suboptimal, because even when socket was known to be dead we still waited ~5 minutes. That prevented us from quickly reestablishing connections to peers when problems occurred.

dsiganos · 2022-11-10T13:08:41Z

nano/node/network.cpp

@@ -769,7 +769,7 @@ void nano::network::ongoing_cleanup ()
 {
 	cleanup (std::chrono::steady_clock::now () - node.network_params.network.cleanup_cutoff ());
 	std::weak_ptr<nano::node> node_w (node.shared ());
-	node.workers.add_timed_task (std::chrono::steady_clock::now () + node.network_params.network.cleanup_period, [node_w] () {
+	node.workers.add_timed_task (std::chrono::steady_clock::now () + std::chrono::seconds (1), [node_w] () {


What is happening to cleanup_period, are we dropping its use?
Isn't checking all sockets every second a bit excessive?

The cleanup period was set to 60 seconds by default and is still used in multiple different contexts, with interdependencies not being very clear. Here we just iterate all sockets (~1000 max by default) every second to see if any of them are dead. In the grand scheme of things this iteration is a small blip on the profiler chart.

But 1 second might be a bit excessive, adjusting it to 5 seconds should not negatively impact the fixes in any way.

Use lower value for dev network to speed up timeout unit tests

pwojcikdev force-pushed the dead-channels branch from ef895f0 to d225eb2 Compare November 9, 2022 17:18

pwojcikdev added 3 commits November 9, 2022 18:21

Remove channels with dead underlying sockets

5fe61cb

Fix tests

602d4c8

Add tests for purging dead channels

618052a

pwojcikdev force-pushed the dead-channels branch from d225eb2 to 618052a Compare November 9, 2022 17:21

pwojcikdev added 2 commits November 9, 2022 23:05

More frequent keepalive messages

8c36cb1

Fix node.peers unit test

1f22c15

pwojcikdev requested review from dsiganos, clemahieu and thsfs November 10, 2022 10:28

dsiganos reviewed Nov 10, 2022

View reviewed changes

dsiganos previously approved these changes Nov 10, 2022

View reviewed changes

Adjust network ongoing cleanup period

5440b5b

Use lower value for dev network to speed up timeout unit tests

pwojcikdev dismissed dsiganos’s stale review via 5440b5b November 10, 2022 13:55

dsiganos approved these changes Nov 10, 2022

View reviewed changes

pwojcikdev merged commit eb8c1aa into nanocurrency:develop Nov 10, 2022

qwahzi added this to the V24.0 milestone Dec 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved removal of dead network channels #3993

Improved removal of dead network channels #3993

pwojcikdev commented Nov 9, 2022

dsiganos Nov 10, 2022

pwojcikdev Nov 10, 2022

pwojcikdev Nov 10, 2022

Improved removal of dead network channels #3993

Improved removal of dead network channels #3993

Conversation

pwojcikdev commented Nov 9, 2022

dsiganos Nov 10, 2022

Choose a reason for hiding this comment

pwojcikdev Nov 10, 2022

Choose a reason for hiding this comment

pwojcikdev Nov 10, 2022

Choose a reason for hiding this comment