-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some BlockFetch decline decisions are not revisited after completeFetchBatch
#1147
Labels
Milestone
Comments
I added the Bug label. That might be an overstatement, but at the very least Duncan agreed on Slack that the current behavior is not as prompt as we'd prefer. |
@nfrisby What's the status on this? |
I responded on Slack. Summary: Duncan had planned to do it, though I have a commit on my branch that is more "surgical" than what he had discussed, if we want to go with that for now. |
dcoutts
added a commit
that referenced
this issue
Jan 6, 2020
Fixes #1147 The point is that we should only base decisions (to do or not do a thing, as opposed to things that only affect how it is done) on things that are in the fingerprint that changes. The nub of the bug was that we were deciding to not do this request based on peerFetchReqsInFlight == 0 which is not part of the fingerprint. By using PeerFetchStatusReady Set.empty instead we're making a decision on a slightly different condition, but one that's within the info covered by the fingerprint. The condition is different in that instead of saying that the requests are completely done, the number of outstanding replies is empty. The distinction is the message after the last block in a range request. But this difference really doesn't matter at all. Our decision is simply to avoid too much network concurrency, so it doesn't matter if there's one remaining small message to arrive.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While developing infra-slot network latencies (PR #1131), I'm seeing a violation of our current variant of Chain Growth. This Issue is complete when that test case is no longer a failure.
The current repro, which might not work directly on future
master
, is as follows inTest.Dynamic.Praos
. (If it unexpectedly does not fail on a branch with latencies, my only suggestion is to let the latency seed vary and hope it discovers the failure.)The specific failure happens because network latency introduces a non-negligible duration (never before seen in the test suite) between
MsgBlock
andMsgBatchDone
. A particular node makes aFetchDeclineConcurrencyLimit
decision (due topeerFetchReqsInFlight = 1
) after receivingMsgBlock
but before receivingMsgBatchDone
(which setspeerFetchReqsInFlight = 0
). The BlockFetch decision re-evaluation trigger currently does not includepeerFetchReqsInFlight
:MsgBlock
would trigger a re-evaluation butMsgBatchDone
does not. Because of that decision, nodesc1
andc2
do not adopt the chain thatc0
forged, which is the longest chain in the network. That's a violation of our currently-very-strict variant of Chain Growth.Apparent ways to resolve this Issue:
MsgBatchDone
triggers re-evaluation (i.e. expand the content ofFetchStateFingerprint
to include thepeerFetchReqsInFlight
fields; and perhaps similar for other bases for decline decisions?)PeerStatus
, which is already part of theFetchStateFingerprint
.Property
to anticipate this.The text was updated successfully, but these errors were encountered: