You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feature request is about introducing the notion of checkpoint protection from cleanup operations up to a particular version. No checkpoint removal/creation before that version is allowed unless everything is cleaned up in one go. This feature can be used as a building block for dropping features without needing to truncate history.
Motivation
Today, dropping a feature requires the execution of the DROP FEATURE command twice with a 24 hour waiting time in between. In addition, it also results in the truncation of the history of the Delta table to the last 24 hours.
We can improve this process by introducing checkpointProtection, which allows us to set up the table's history (including checkpoints) in such a way that older readers will be able to handle it correctly until we atomically delete it.
A key component of this solution is a special set of protected checkpoints at the DROP FEATURE boundary that are guaranteed to persist until all history is truncated up to the checkpoints in one go. These checkpoints act as barriers that hide unsupported log records behind them. With the checkpointProtection, we can guarantee these checkpoints will persist until history is truncated.
Furthermore, with the new drop feature method, validating against the latest protocol is no longer sufficient. Therefore, creating checkpoints to historical versions can lead to corruption if the writer does not support the target protocol. The checkpointProtection also protects against these cases by disallowing checkpoint creation before requireCheckpointProtectionBeforeVersion.
With these changes, we can drop table features in a single command without needing to truncate history. More importantly, they simplify the drop feature user journey by requiring a single execution of the DROP FEATURE command.
Details
The checkpointProtection is a Writer feature that enforces writers to cleanup metadata iff metadata can be cleaned up to the requireCheckpointProtectionBeforeVersion table property in one go. This means that a single cleanup operation should truncate up to requireCheckpointProtectionBeforeVersion as opposed to several cleanup operations truncating in chunks.
The are two exceptions to this rule. If any of the two holds, the rule above can be ignored:
a) The writer verifies it supports all protocols between [start, min(requireCheckpointProtectionBeforeVersion, targetCleanupVersion)].
b) The writer does not create any checkpoints during history cleanup and does not erase any checkpoints after the truncation version.
The checkpointProtection can only be removed if history is truncated up to at least the requireCheckpointProtectionBeforeVersion.
Willingness to contribute
The Delta Lake Community encourages protocol innovations. Would you or another member of your organization be willing to contribute this feature to the Delta Lake code base?
Yes. I can contribute.
Yes. I would be willing to contribute with guidance from the Delta Lake community.
No. I cannot contribute at this time.
The text was updated successfully, but these errors were encountered:
andreaschat-db
changed the title
[PROTOCOL RFC] Checkpoint Protection up to a Version
[PROTOCOL RFC] Checkpoint Protection up to Version
Feb 13, 2025
andreaschat-db
changed the title
[PROTOCOL RFC] Checkpoint Protection up to Version
[PROTOCOL RFC] Checkpoint Protection Up To Version
Feb 13, 2025
Protocol Change Request
Overview
This feature request is about introducing the notion of checkpoint protection from cleanup operations up to a particular version. No checkpoint removal/creation before that version is allowed unless everything is cleaned up in one go. This feature can be used as a building block for dropping features without needing to truncate history.
Motivation
Today, dropping a feature requires the execution of the DROP FEATURE command twice with a 24 hour waiting time in between. In addition, it also results in the truncation of the history of the Delta table to the last 24 hours.
We can improve this process by introducing
checkpointProtection
, which allows us to set up the table's history (including checkpoints) in such a way that older readers will be able to handle it correctly until we atomically delete it.A key component of this solution is a special set of protected checkpoints at the DROP FEATURE boundary that are guaranteed to persist until all history is truncated up to the checkpoints in one go. These checkpoints act as barriers that hide unsupported log records behind them. With the
checkpointProtection
, we can guarantee these checkpoints will persist until history is truncated.Furthermore, with the new drop feature method, validating against the latest protocol is no longer sufficient. Therefore, creating checkpoints to historical versions can lead to corruption if the writer does not support the target protocol. The
checkpointProtection
also protects against these cases by disallowing checkpoint creation beforerequireCheckpointProtectionBeforeVersion
.With these changes, we can drop table features in a single command without needing to truncate history. More importantly, they simplify the drop feature user journey by requiring a single execution of the DROP FEATURE command.
Details
The
checkpointProtection
is a Writer feature that enforces writers to cleanup metadata iff metadata can be cleaned up to therequireCheckpointProtectionBeforeVersion
table property in one go. This means that a single cleanup operation should truncate up torequireCheckpointProtectionBeforeVersion
as opposed to several cleanup operations truncating in chunks.The are two exceptions to this rule. If any of the two holds, the rule above can be ignored:
a) The writer verifies it supports all protocols between
[start, min(requireCheckpointProtectionBeforeVersion, targetCleanupVersion)]
.b) The writer does not create any checkpoints during history cleanup and does not erase any checkpoints after the truncation version.
The
checkpointProtection
can only be removed if history is truncated up to at least therequireCheckpointProtectionBeforeVersion
.Willingness to contribute
The Delta Lake Community encourages protocol innovations. Would you or another member of your organization be willing to contribute this feature to the Delta Lake code base?
The text was updated successfully, but these errors were encountered: