Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [ibft] Fix validator panic due to repeat notification in syncing #320

Conversation

DarianShawn
Copy link
Collaborator

Description

The validators will panic after syncing a bulk of blocks, due to incorrectly implementation of IBFT consensus context handling.
The consensus routine will be triggered multiple times even it is not thread-safe, and occasionally, the built block is clear before gossip it, then the server panic with nil object access.
The PR:

  • Uses an atomic value to disable reentrant before last routine stopped.
    • To prevent reentrant, which will crash all states.
  • Update validator set only when notified new block.
    • It is a better triggered point because it will never meet sequence problem before join in IBFT consensus cycle.

Changes include

  • Hotfix (change that solves an urgent issue, and requires immediate attention)

Testing

  • I have tested this code with the official test suite

@DarianShawn DarianShawn added the hotfix Major bug fix that should be merged ASAP label Mar 10, 2023
@DarianShawn DarianShawn added this to the Release 1.3.0 milestone Mar 10, 2023
@DarianShawn DarianShawn self-assigned this Mar 10, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #320 (ff0147a) into dev (d7c5a9e) will increase coverage by 0.28%.
The diff coverage is 28.46%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##              dev     #320      +/-   ##
==========================================
+ Coverage   47.74%   48.02%   +0.28%     
==========================================
  Files         132      132              
  Lines       20125    20103      -22     
==========================================
+ Hits         9608     9655      +47     
+ Misses       9684     9620      -64     
+ Partials      833      828       -5     
Impacted Files Coverage Δ
consensus/ibft/currentstate/state.go 18.07% <0.00%> (+0.39%) ⬆️
consensus/ibft/snapshot.go 59.71% <0.00%> (-0.35%) ⬇️
protocol/client.go 48.61% <ø> (-0.61%) ⬇️
protocol/service.go 26.17% <ø> (+4.26%) ⬆️
consensus/ibft/ibft.go 23.41% <16.66%> (+0.33%) ⬆️
consensus/ibft/consensus.go 37.17% <44.00%> (+1.29%) ⬆️
protocol/syncer.go 49.49% <51.85%> (-0.12%) ⬇️

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@0xcb9ff9 0xcb9ff9 mentioned this pull request Mar 11, 2023
5 tasks
@DarianShawn DarianShawn added the WIP work in process label Mar 11, 2023
@DarianShawn DarianShawn removed this from the Release 1.3.0 milestone Mar 11, 2023
@DarianShawn DarianShawn changed the title [ibft] Fix validator panic due to repeat notification in syncing WIP: [ibft] Fix validator panic due to repeat notification in syncing Mar 11, 2023
@DarianShawn
Copy link
Collaborator Author

Tests pass, but manual tests fail when shutting down one of the 4 validator.
Blocks are no longer produced after stopping 1 validator, which is a disaster.
Stop this PR because we need to reinvent a real working solution for IBFT, instead of tinkering here and there.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 11, 2023
@DarianShawn DarianShawn deleted the hotfix-validator-panic-due-to-repeat-notification-in-syncing branch March 27, 2023 06:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotfix Major bug fix that should be merged ASAP WIP work in process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants