Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend PBFT tests with protocol parameter update proposals/votes/endorsements/adoption #1514

Closed
edsko opened this issue Jan 28, 2020 · 12 comments
Assignees
Labels
byron Required for a Byron mainnet: replace the old core nodes with cardano-node consensus issues related to ouroboros-consensus testing
Milestone

Comments

@edsko
Copy link
Contributor

edsko commented Jan 28, 2020

The existing PBFT tests already deal with delegation certificates. They should be extended with update proposals/votes/endorsements. I think the general setup would be similar: "posting the delegation certificate" would correspond to posting an update proposal and having nodes vote on it, and then the proposal is endorsed when "enough" nodes are restarted (which is also required for delegation certificates). We should then also verify somehow that the update proposal is in the end endorsed -- for example, change a protocol parameter and then produce a block that depends crucially on the new parameter.

@edsko edsko added testing priority high byron Required for a Byron mainnet: replace the old core nodes with cardano-node labels Jan 28, 2020
@edsko
Copy link
Contributor Author

edsko commented Jan 28, 2020

Ideally we'd merge these two sets of tests, but that's harder: IntersectMBO/ouroboros-consensus#725.

@edsko
Copy link
Contributor Author

edsko commented Feb 11, 2020

@nfrisby has been busy tracing down other problems. Moving this to the next sprint.

@edsko edsko modified the milestones: S6 2020-02-13, S7 2020-02-27 Feb 11, 2020
@edsko edsko modified the milestones: S7 2020-02-27, S8 2020-03-12 Feb 27, 2020
@nfrisby
Copy link
Contributor

nfrisby commented Feb 28, 2020

I've familiarized myself with the relevant parts of cardano-ledger and the corresponding PDF. The basic changes needed for Network seem pretty straight-forward (though I suspect I might need to introduce some additional synchronization so that nodes restart "immediately" after the onset of the first slot of the new epoch).

Please confirm two observations:

  1. Adopted updates only take effect at the beginning of a new epoch.

  2. There are only two (edit) three parts (what else?) of the static node configuration that need to be updated when a protocol update is adopted.

    • The slot length (which is passed to the realBlockchainTime call in Node.hs) must be updated to the new ppSlotDuration (adoptedProtocolParamters cvsUpdateState). (Edit: ppSlotDuration has not and never will change via a protocol parameter update proposal; see Adds two UTxO properties cardano-ledger#642 and passing a singletonSlotLengths to realBlockchainTime is almost certainly incorrect if an adopted update proposal ever changed the slot duration protocol parameter #1725.)

    • The maximum mempool size (which is chosen as 2 the max block size of the "current" ledger when the node starts) must be updated according to the new ppMaxBlockSize (adoptedProtocolParamters cvsUpdateState).

    • The byronProtocolVersion field of the BlockConfig ByronBlock record needs to be updated to the new adoptedProtocolVersion cvsUpdateState. (Edit: it seems like that's not actually necessary; see my next comment on this Issue.)

    • All other protocol parameter values (from what I've seen so far) are only read from adoptedProtocolParamters cvsUpdateState as-needed during the various validation routines in cardano-ledger.

cc: @edsko @mrBliss @nc6

@nfrisby
Copy link
Contributor

nfrisby commented Mar 1, 2020

I have to first figure out how these features are all supposed to work before I can write code to test it. This large comment is the result. Since it is essentially documentation and code review, is there a better place for it to live? I anticipate raising some Issues as a result of this review, but haven't done so yet. I plan to at least discuss the details with @edsko first.

From skimming the Byron ledger spec pdf and some of its references and inspecting the cardano-ledger code, I've determined the overall sequence of events involved in a successful protocol update proposal -- I'm ignoring software update proposals for now, since they seem beyond the scope of our current test suite.

  1. A protocol update proposal transaction is created . It proposes new values for some protocol parameters and a greater protocol version number as an identifier. There cannot be two proposals with the same version number.

  2. Genesis key delegates can add vote transactions that refer to such a proposal (by its hash). They don't have to wait; a node could add a proposal and a vote for it to its mempool simultaneously. There are only positive votes, and a proposal has a time-to-live (see ppUpdateProposalTTL) during which to gather sufficient votes. While gathering votes, a proposal is called active.

  3. Once the number of voters satisfies a threshold (currently determined by the srMinThd field of the ppSoftforkRule protocol parameter), the proposal becomes confirmed.

  4. Once the threshold-satisfying vote becomes stable (ie its containing block is >=2k slots old), a block whose header's protocol version number (CC.Block.headerProtocolVersion) is that of the proposal is interpreted as an endorsement of the stably-confirmed proposal by the block's issuer (specifically by the Verification Key of its delegation certificate). Endorsements -- ie any block, since they all contain that header field --- also trigger the system to discard proposals that were not confirmed within their TTL.
    https://github.com/input-output-hk/cardano-ledger/blob/172b49ff1b6456851f10ae18f920fbfa733be0b0/cardano-ledger/src/Cardano/Chain/Block/Validation.hs#L439-L444
    Notably, endorsements for proposals that are not yet stably-confirmed (or do not even exist) are not invalid but rather silently ignored. In other words, no validation applies to the headerProtocolVersion field.

  5. Once the number of endorsers satisfies a threshold (same as for voting), the confirmed proposal becomes a candidate proposal.

  6. At the beginning of an epoch, the candidate proposal with the greatest protocol version number among those candidates whose threshold-satisfying endorsement is stable (ie the block is >=2k slots old) is adopted: the new protocol parameter values have now been changed.
    If there was no stably-candidated proposal, then nothing happens. Everything is retained; in particular, a candidate proposal whose threshold-satisfying endorsement was not yet stable will be adopted at the subsequent epoch unless it is surpassed in the meantime.
    When a candidate is adopted, all record of other proposals/votes/endorsements -- regardless of their state -- is discarded. The explanation for this is that such proposals would now be interpreted as an update to the newly adopted parameter values, whereas they were validated as an update to the previously adopted parameter values.

In summary, the following diagram tracks the progress of a proposal that's eventually adopted. For other proposals, the path short circuits to a "rejected/discarded" status at some point.

active proposal
    --> (sufficient votes)
confirmed proposal
    --> (2k slots later)
stably-confirmed proposal
    --> (sufficient endorsements)
candidate proposal
   --> (2k slots later)
stably-candidated proposal    (Frisby: stably-nominated?)
   --> (epoch transition)
adopted proposal

I have some concerns that maybe should become Issues.

  • Figure 47 "State-transition diagram for protocol-updates" in the 2020 Feb 9 Byron ledger spec pdf shows four possible states of a proposal: "Active", "Rejected", "Confirmed", and "Adopted". Its use of "Adopted" is somewhat inconsistent with the pdf's text and the cardano-ledger code, since the diagram -- strictly read as written -- indicates the proposal can reach "Adopted" before the epoch transition. It also only mentions the "2k slots" requirement for "Adopted" but not that endorsements require the confirmation itself to be 2k slots old. In summary, I think it should be expanded so that it subsumes my ASCII-art diagram above (which is only missing the "Rejected" state).

  • I think the re-(ab)use of srMinThd :: LovelacePortion as a portion of genesis keys instead of as a portion of lovelace needs more discussion/explanation/motivation/justification. The cardano-ledger code (https://github.com/input-output-hk/cardano-ledger/blob/172b49ff1b6456851f10ae18f920fbfa733be0b0/cardano-ledger/src/Cardano/Chain/Update/ProtocolParameters.hs#L178-L189) says "In Byron we do not have [the protocol parameter we actually want], so we have to use the existing ones" and the "Update adoption threshold:upAdptThd" bullet in the pdf says essentially the same. But is this a temporary workaround? Etc. In particular, I think it should address the implication that all genesis keys will continually hold equal stake. Something along the lines of, "this is just a rough approximation for Byron, we'll do better in Shelley", I'd guess? As it is now, it feels like I'm missing some context.

  • I created Sync executable with formal spec - UTXO STS cardano-ledger#731


The above has some implications for this Issue.

  • The current node implementation caches the headerProtocolVersion upon startup; it's an argument to the top-level entry point Ouroboros.Consensus.Node.run, but I don't know which repositories' code calls into that. I'd guess it's read from a config file/CLI argument?

  • More to the point: the node has to restart in order to change headerProtocolVersion. This means the node has to restart even just to begin endorsing a proposal, even though that proposal's new parameter values are not yet adopted. That seems rather severe, but it is by design.

  • Some protocol parameter values are also cached on start-up (ppSlotDuration affects clock ticks, ppMaxBlockSize affects mempool max size, and maybe more?). so the node will have to restart at the beginning of any epoch that actually adopts a proposal too.

I think having the test nodes restart at the beginning of every epoch seems reasonable, as long as I take precautions to not e.g. discard the contents of the mempool. However, to also restart in order to start endorsing seems like it's going to be quite confusing. I don't see an alternative right now, unless we some how choose to make the source of the headerProtocolVersion mutable.

Moreover, ouroboros-consensus currently neglects changes to the ppSlotDuration parameter; see Issue #1725 for more. There's a bit of a chicken-and-egg problem here: the ChainDB requires a BlockChainTime in order to open, but to have a correct BlockChainTime if ppSlotDuration has changed over time requires reading the latest ledger validation state with the current slot ticked, in case that's an epoch transition.

Edit: As it turns out, since ppSlotDuration is not actually updatable and the use of a stale ppMaxBlockSize for the mempool size won't actually affect consensus (as long as the max block size was never essentially 0), the node does not need to restart when a protocol parameter update proposal is adopted.

Edit: The answer on Issue 1725 implies that ppSlotDuration has not and never will change via a protocol parameter update proposal. Also, Issue #1498 is to make the mempool's max size change dynamically in response to ppMaxBlockSize updates.

@edsko
Copy link
Contributor Author

edsko commented Mar 2, 2020

@dnadales , could you confirm #1514 (comment) ?

@dnadales
Copy link
Member

dnadales commented Mar 2, 2020

Adopted updates only take effect at the beginning of a new epoch.

Yes.

The slot length (which is passed to the realBlockchainTime call in Node.hs) must be updated to the new ppSlotDuration (adoptedProtocolParamters cvsUpdateState). (Maybe not: see IntersectMBO/cardano-ledger#642 and #1725)

As far as I remember we wouldn't be changing the slot duration on protocol updates. At least not as a protocol parameter.

The maximum mempool size (which is chosen as 2 the max block size of the "current" ledger when the node starts) must be updated according to the new ppMaxBlockSize (adoptedProtocolParamters cvsUpdateState).

This sounds reasonable.

All other protocol parameter values (from what I've seen so far) are only read from adoptedProtocolParamters cvsUpdateState as-needed during the various validation routines in cardano-ledger.

Yup, seems right.

CC: @nfrisby

@edsko
Copy link
Contributor Author

edsko commented Mar 2, 2020

Thanks @dnadales !

As far as I remember we wouldn't be changing the slot duration on protocol updates. At least not as a protocol parameter.

This will happen, albeit indirectly: protocol updates determine the timing of the hard fork, and the hard fork can change the slot length. This is however the only place where slot lengths can change.

@nfrisby
Copy link
Contributor

nfrisby commented Mar 2, 2020

Thank you both.

protocol updates determine the timing of the hard fork

@edsko How does that happen? Are you referring to a software update proposal as opposed to a protocol (parameter) update proposal?

@edsko
Copy link
Contributor Author

edsko commented Mar 2, 2020

Uh, yes, I think that might be right. @dcoutts or @JaredCorduan can confirm.

@nfrisby nfrisby changed the title Extend PBFT tests with update proposals/votes Extend PBFT tests with protocol parameter update proposals/votes/endorsements/adoption Mar 3, 2020
@nfrisby
Copy link
Contributor

nfrisby commented Mar 9, 2020

I just bors'd PR 1747 (edit: it has now merged). It adds a somewhat trivial proof-by-example that it's possible for the ThreadNet tests to increment the Byron protocol version, which is the necessary precondition for the Byron-to-Shelley hard-fork. Every generated RealPBFT test case now includes a proposal and votes and endorsements that should cooperatively update the protocol version at the next second epoch.

Without retuning the other RealPBFT generators, it seems like half of the test cases successfully do so and half don't (e.g. they don't even reach the second epoch, or the proposal times out before enough votes, etc).

The test takes the most direct path:

  • The first node to join immediately introduces a proposal that updates the protocol version but does actually not update any proper protocol parameters. Likely in the same block, it votes for the proposal.
  • Other nodes vote for the proposal as soon as they join the net.
  • The initial static configuration of all nodes is such that every block they produce endorses the proposed protocol version. This is somewhat artificial (even the block that includes the proposal tx is attempting to endorse the proposed version) but seems harmless since this endorsement field is essentially irrelevant except under very specific circumstances.
  • The tests inherit the upstream-package's default test proposal TTL of 10 slots.

@edsko Is that sufficient to close this Issue? If not, can we replace it with a lower priority/urgency Issue? Thanks.

iohk-bors bot added a commit that referenced this issue Mar 9, 2020
1747: check that Byron can increment its adopted protocol version r=nfrisby a=nfrisby

I'm not sure if this should be considered to fully address #1514, but it's good first step.

It originally that #1161 is high-priority because of votes: they are replicated and fill up the blocks, making the test quite expensive. Other transactions (so far) haven't been as problematic because duplicate transactions other than votes are invalid.

Therefore this PR is rooted on top of a potential fix for 1161; it's blocked by the corresponding PR. Good news, though: the protocol version update is succeeding when expected.

I'm opening this as Draft since it currently includes the latest state of the other PR.

Co-authored-by: Nicolas Frisby <[email protected]>
@edsko
Copy link
Contributor Author

edsko commented Mar 10, 2020

Yes, we can close this. Opened #1768 for testing protocol parameter changes (medium priority), and IntersectMBO/ouroboros-consensus#703 for testing non-ideal voting scenarios (low priority). There is also your existing #1733 about testing software version updates, which we think is still high priority; see #1733 (comment).

We suggest to look at #1733 and then move on to to Shelley, and in particular, start looking at #1770 .

@edsko edsko closed this as completed Mar 10, 2020
@edsko
Copy link
Contributor Author

edsko commented Mar 10, 2020

Let's first get #1564 merged though, so that we can close #1297 (I guess that's almost done anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
byron Required for a Byron mainnet: replace the old core nodes with cardano-node consensus issues related to ouroboros-consensus testing
Projects
None yet
Development

No branches or pull requests

3 participants