-
Notifications
You must be signed in to change notification settings - Fork 4.7k
blockstore: add last fec set details column #34651
Conversation
} | ||
|
||
#[derive(Clone, Copy, Debug, Eq, PartialEq, Serialize, Deserialize)] | ||
pub enum LastFecSetDetails { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an enum to support future extensibility when we eventually add IP check/retransmit signature verification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you already baked in an enum to allow it to grow. A few notes:
- One consideration will be backwards compat; namely, making sure old code that is unaware of new enum instances can handle those instances gracefully. Not a concern for this PR, but just mentioning it so we're thinking about it a little
- What will future enum instances of
LastFecSetDetails
look like. The current enum has just aSizeDetails
; will future iterations contain a completely different struct ORSizeDetails
+ some new info OR something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I envision future iterations to be inclusive, so SizeDetails
+ Option<AdditionalInfo>
.
We add additional info as an option in order to support backwards compatibility.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #34651 +/- ##
=========================================
- Coverage 81.8% 81.8% -0.1%
=========================================
Files 824 824
Lines 222391 222405 +14
=========================================
- Hits 181959 181954 -5
- Misses 40432 40451 +19 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that we do not intend to backport this column to v1.17 ? Rather, we can let the auto-detect feature handle compatibility between master/v1.18 and v1.17 ?
} | ||
|
||
#[derive(Clone, Copy, Debug, Eq, PartialEq, Serialize, Deserialize)] | ||
pub enum LastFecSetDetails { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you already baked in an enum to allow it to grow. A few notes:
- One consideration will be backwards compat; namely, making sure old code that is unaware of new enum instances can handle those instances gracefully. Not a concern for this PR, but just mentioning it so we're thinking about it a little
- What will future enum instances of
LastFecSetDetails
look like. The current enum has just aSizeDetails
; will future iterations contain a completely different struct ORSizeDetails
+ some new info OR something else?
ledger/src/blockstore_meta.rs
Outdated
pub num_coding_received: usize, | ||
pub num_data_received: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave some comments what constitutes as "received"? with respect to different routes you might obtain a shred: turbine, repair, erasure recovery.
@@ -526,6 +526,19 @@ impl OptimisticSlotMetaVersioned { | |||
} | |||
} | |||
} | |||
|
|||
#[derive(Clone, Copy, Debug, Eq, PartialEq, Serialize, Deserialize)] | |||
pub struct SizeDetails { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you expect to populate these fields?
When you receive a (code or data) shred, you do not know if that belongs to the last FEC set unless it is the very last data shred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we wait until we receive a shred with LAST_SHRED_IN_SLOT
flag. At that point we know the fec_set_index
of the last fec set.
If we have already seen a coding shred at that point we can use the ErasureMeta
to perform a one time scan of blockstore to see how many coding/data shreds we have received.
If not we we set the num_data_inserted
and num_coding_inserted
to 0.
If we receive a coding shred while num_coding_inserted
is 0, we perform the one time scan using the ErasureMeta
and update num_coding_inserted
to 1, and num_data_inserted
to the results of the scan.
For further shred insertions, we increment num_data_inserted
or num_coding_inserted
.
After shred insertion if the sum of num_data_inserted
or num_coding_inserted
has now crossed the 2 * DATA_SHREDS_PER_FEC_BLOCK
(64 shreds) threshold, then we notify replay through crossbeam.
We need to notify replay here because replay could have already replayed the block and marked it as dead/invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have already seen a coding shred at that point we can use the ErasureMeta to perform a one time scan of blockstore to see how many coding/data shreds we have received.
and what if we have not received a coding shred yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not we we set the num_data_inserted and num_coding_inserted to 0.
If we receive a coding shred while num_coding_inserted is 0, we perform the one time scan using the ErasureMeta and update
Yes that's correct. I've kept this as a stub PR just in case we need to backport this to v1.16 for whatever reason, although if the auto-detect feature is backported to v1.16 that would be unnecessary. |
c0cf4c8
to
cf28525
Compare
Awesome and good call on doing so; we opted to push the |
/// These fields keep track of any shreds inserted into blockstore, | ||
/// through turbine/repair or erasure recovery. | ||
/// Before voting replay will check that the sum of these values are at | ||
/// least 2 * `DATA_SHREDS_PER_FEC_BLOCK`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we wanted to make sure that the leader broadcasted enough many shreds of the last FEC set.
With this accounting, the leader can generate a 1:63 erasure batch (one data shred and 63 coding shred), send only 1 coding shred to a node, and that node will be able to recover the whole batch and insert the remaining shreds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, that's an excellent point. we'll need to differentiate the source of the shreds.
let me start a discussion to figure out the requirement.
@behzadnouri pointed out that the
|
Problem
There are a number of additional checks we want to perform on the last fec set of each block. This additional column stores the details necessary for these checks allowing easy replay verification with only one blockstore read.
Summary of Changes
Add last fec set details column