-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle restarting with validation automatically #1551
Comments
|
Comment that will have to be updated/removed: |
Edsko and I discussed two possible interpretations of the marker and found them to be (almost?) isomorphic. However, the approach described in the description has the following advantage: the user can't accidentally delete the marker. |
Note that this means we only need two exit codes: restart, or "don't bother restarting" (and automated tools can just treat both as "restart"). |
("Don't bother restarting" might still be useful to distinguish some subcases, to give better feedback to Daedalus.) |
1554: Detect clock changes r=edsko a=edsko Closes #759. Code is tested using mock time; result of the test labelling: ``` # cabal run test-consensus -- -p delayClockShift --quickcheck-replay=680184 --quickcheck-tests 10000 Up to date ouroboros-consensus WallClock delayClockShift: OK (18.51s) +++ OK, passed 10000 tests. schedule goes back (10000 in total): 63.07% False 36.93% True schedule length (10000 in total): 81.03% R_Gt 20 9.26% R_Btwn (10,20) 4.08% R_Btwn (5,10) 1.04% R_Btwn (4,5) 1.03% R_Eq 1 0.96% R_Eq 2 0.94% R_Eq 4 0.87% R_Eq 0 0.79% R_Eq 3 schedule skips (10000 in total): 38.66% R_Btwn (10,20) 24.96% R_Btwn (5,10) 5.59% R_Eq 0 5.43% R_Eq 2 5.27% R_Btwn (4,5) 5.12% R_Eq 1 5.08% R_Eq 3 4.99% R_Eq 4 4.90% R_Gt 20 All 1 tests passed (18.52s) ``` I also verified that the exception bubbles up to the node. Ran the latest node with this PR, set my clock back an hour, and got ``` cardano-node: ExceptionInLinkedThread "ThreadId 20" (SystemClockMovedBack 2020-01-31 11:15:31.00184141 UTC (SlotNo {unSlotNo = 3713312}) (SlotNo {unSlotNo = 3713132})) ``` There is no need to define a custom error policy for this due to #1553 , nor a custom exit failure due to #1551 (comment) . Co-authored-by: Edsko de Vries <[email protected]>
1623: Validate the ChainDB when the clean shutdown marker is missing r=mrBliss a=mrBliss Fixes #1551. Co-authored-by: Thomas Winant <[email protected]>
1623: Validate the ChainDB when the clean shutdown marker is missing r=mrBliss a=mrBliss Fixes #1551. Co-authored-by: Thomas Winant <[email protected]>
1623: Validate the ChainDB when the clean shutdown marker is missing r=mrBliss a=mrBliss Fixes #1551. Co-authored-by: Thomas Winant <[email protected]>
1623: Validate the ChainDB when the clean shutdown marker is missing r=edsko a=mrBliss Fixes #1551. Co-authored-by: Thomas Winant <[email protected]>
Related to #1200 and #1201. When the node is being run by the wallet, it uses error codes to communicate information about the fatal error to the wallet (see #1541). For example, in case a file from the database is missing, the wallet should restart the node with the validation flag enabled.
In a server setting, where the node process is being run by systemd or runit, the node process will automatically be restarted when it terminated. There will be no human reading the error message or checking the error code to enable the validation flag when necessary.
We should be able to automatically validate when necessary.
Plan: add a marker file to the database on a clean shutdown (TODO define "clean"). On startup, if the marker is there, then we don't have to do any additional validation (just the last epoch, as we always do by default). If the marker is missing, we do a full validation.
The text was updated successfully, but these errors were encountered: