Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow confirmation with large amount of single account blocks #3574

Closed
RickiNano opened this issue Nov 26, 2021 · 10 comments
Closed

Very slow confirmation with large amount of single account blocks #3574

RickiNano opened this issue Nov 26, 2021 · 10 comments
Milestone

Comments

@RickiNano
Copy link
Contributor

Summary

Confirming large amount of 1 raw receive blocks becomes extremely slow
During testing of v23RC1 I created 300k+ single account blocks
At first the blocks were processed fine at around 2 cps. Over time though the confirmation has slowed to a crawl

image

It has been speculated that each node has different blocks in the AEC and not voting on the same blocks.
The rest of the network seems fine and other blocks are confirmed quickly

Node version

V23RC1

Build details

Official binaries provided from the beta Discord

OS and version

Windows 10

Steps to reproduce the behavior

  1. Create 300k spam blocks of 1 raw each to account "X"
  2. Go to account "X" and receive the 300k blocks

Expected behavior

Blocks should confirm steadily at about 2 CPS

Actual behavior

Blocks are initially confirmed at 2 CPS but then becomes very erratic with long periods of no confirmations at all

Possible solution

Could be caused by the AEC being filled with blocks that can not be confirmed until all the previous blocks of this chain has been confirmed
Blocks that have no confirmed previous block should not be in the AEC. I don't know if that is the case today

Supporting files

No response

@zhyatt
Copy link
Collaborator

zhyatt commented Nov 29, 2021

@RickiNano So you published the 300k send blocks of 1 raw each and those were fully confirmed before you started the 300k receive blocks to match them? Or did you send out the 300k receive blocks before the sends were confirmed?

And to be clear, I am assuming the 300k send blocks were all done from a single account, thus the expected consistent 2-3 CPS.

@RickiNano
Copy link
Contributor Author

@zhyatt
The 300k send blocks were actually just very old spam from months ago. They were all single account transactions and they were all confirmed.
I then began receiving the transactions to test this bug:
#3491
That's when I noticed the problem described in this ticket

@zhyatt
Copy link
Collaborator

zhyatt commented Nov 29, 2021

I see, thanks for clarifying @RickiNano

@zhyatt
Copy link
Collaborator

zhyatt commented Nov 29, 2021

So the original send blocks were already confirmed, all pointing to a single account. Then on Nov 23rd while the network was on V23.0DB3 you published those 300k receive blocks for that single account, which everyone appears to have gotten in their ledger quickly, and confirmations were running as expected for a while (~2-3CPS for first 50k blocks confirmed in ~5 hours) but then the confirmation rate got quite choppy. Note that the following day on Nov 24th the RC1 build was released but this hasn't appeared to change behavior thus far.

@RickiNano Can you confirm you only did 300k? Based on graphs it looks like 500k-600k blocks were published on the Nov 23rd (see https://discord.com/channels/370266023905198083/769209197333053511/913232765581946920)

@RickiNano
Copy link
Contributor Author

RickiNano commented Nov 29, 2021

I think you are right @zhyatt . It was 500k-600k blocks. Don't know why I thought it was 300k
What I think is happening is that every nodes AEC is getting filled up with blocks when the blocks were published. All nodes got the same 50k blocks (the first 50k blocks that could fit into AEC). From then the nodes would pick different blocks to put into AEC and vote on. So AECs got "desynced"
Maybe this is caused by the round robin? Since every node is at a different bucket when they select a new block they will also get a different one.
I'm just guessing here. However it is a strange coincidence that the AEC can hold 50k blocks and the first 50k blocks were processed without any problems
BTW: I think someone from the beta discord is trying to recreate the problem on a private network

@RickiNano
Copy link
Contributor Author

One more thing: If you look at the beta voting visualizer you will see that almost no voting is happening:
https://votes.nanos.cc/beta/

@MajorChump
Copy link
Contributor

MajorChump commented Nov 30, 2021

Think I've just followed this exact issue through.

confirm_req_batches_max defaults to 2 in the config. I was monitoring all active elections the node could cement and noticed announcements were basically none existent on the confirmation_info RPC.

Turned this setting up to the maximum (100) and confirmations started, albeit slowly its much faster than it was and announcements are now happening and shown on confirmation_info rpc

Seems like the majority of confirmation requests are being dropped by the sending node

@zhyatt
Copy link
Collaborator

zhyatt commented Dec 8, 2021

@MajorChump Found that 99.9% of returns from nano::confirmation_solicitor::add were true , which indicated an error and prevented confirm_req from being sent. And line 73 is where it was evaluating the the request_queue size was bigger than max_channel_requests.

I think that code only flushes the messages when the amount of requests reaches confirm_req_hashes_max. The default max queue size is 2 times that, so possibly rate limiting mixed with a race condition.

@qwahzi
Copy link
Collaborator

qwahzi commented Dec 23, 2021

For those not on Discord, it looks like #3623 (added in V23DB5) may have helped the single account backlog catch up. Post-update, the beta network seems to consistently confirm the backlog at rates near the single-account maximum (~3 CPS):

image

@RickiNano
Copy link
Contributor Author

This issue has been resolved somewhere between v23.3 and v24rc1
I have recreated the original scenario with another 300k single account account blocks and they are all being confirmed on v24rc1 at a steady 3.3 cps with no pauses or cycling behavior
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants