Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous backlog population #3999

Merged
merged 10 commits into from
Jan 16, 2023

Conversation

pwojcikdev
Copy link
Contributor

@pwojcikdev pwojcikdev commented Nov 15, 2022

Backlog population is a process in which a node scans all accounts in the ledger, with or without any confirmed blocks, and forwards (activates) those accounts which do not have all their blocks confirmed to election scheduler for prioritization and eventual queuing in proper bucket. It is necessary to do this periodically, because the amount of space in each bucket is limited (currently ~2000 entries) and number of accounts needing confirmations can be much higher than that, especially during bootstrap or network spam attack.

The problem with current implementation is that this process runs every 5 minutes and scans the whole ledger at once, leading to situations where we run out of accounts to prioritize before the next run has started. This is especially visible during bootstrapping, a graph showing such situation is included below. We can clearly see the bumps in AEC occupancy where prioritization queue is filled, followed by periods of idleness when priority queue is emptied:

Screen Shot 2022-11-15 at 21 15 39

This PR fixes that by modifying the way the ledger scan is done. Instead of 5 minute interval, we run the scan all the time (unless disabled by setting frontiers_confirmation = disabled node config setting), but we throttle the rate at which the scan is done to limit consumption of node resources. The rate and frequency is controlled by two new node-config.toml settings: backlog_scan_batch_size and backlog_scan_frequency. By default it scans 10000 accounts per second divided into 10 batches, so 1000 accounts per batch. This is rather conservative and should be later adjusted with feedback from beta node operators (before this PR we dit it in batches of 64k).

The result of this PR is the AEC that stays full almost all the time (except the initial phase of the bootstrap):

Screen Shot 2022-11-15 at 21 15 53

@dsiganos
Copy link
Contributor

It looks like the unit test 'request_aggregator.cannot_vote' failed 4 times.

@pwojcikdev
Copy link
Contributor Author

pwojcikdev commented Nov 16, 2022

@dsiganos I see, there seems to be two tests that are still failing, I'm looking into that. Appears to only break on GH runners, so a bit annoying to debug.

@dsiganos
Copy link
Contributor

dsiganos commented Nov 16, 2022

It fails when the system is under heavy load. Starting a parallel build from scratch seems to make it crash.
I am currently fixing the qwahzi unit test and then I will come back to this.

I got this crash on my laptop:

[ RUN      ] request_aggregator.cannot_vote
/Users/ds/CLionProjects/nano-node/nano/core_test/request_aggregator.cpp:501: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected:
zsh: segmentation fault  ./core_test --gtest_repeat=1000 --gtest_filter=request_aggregator.cannot_vote

@dsiganos
Copy link
Contributor

I left a number of minor comments but it looks good to me overall.

@clemahieu clemahieu force-pushed the continuous-backlog-3 branch from 64218b6 to ce52a8f Compare November 30, 2022 11:02
@clemahieu clemahieu force-pushed the continuous-backlog-3 branch from 98227de to 3e60600 Compare January 16, 2023 11:16
…xpects that it will not receive a vote for send1 because it has not made such a request, however, if the election is still or recently active, it may receive a broadcast vote before it makes a request.

Check that the election has ended on node1 and allow some time for in-flight votes broadcasts to finish before starting node2.
@clemahieu clemahieu force-pushed the continuous-backlog-3 branch from 3e60600 to e848e09 Compare January 16, 2023 11:25
@clemahieu clemahieu merged commit 02bffc2 into nanocurrency:develop Jan 16, 2023
@thsfs thsfs added enhancement documentation This item indicates the need for or supplies updated or expanded documentation non-functional change labels Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This item indicates the need for or supplies updated or expanded documentation enhancement non-functional change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants