Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROTOCOL RFC] Checkpoint Protection Up To Version #4152

Open
1 of 3 tasks
andreaschat-db opened this issue Feb 13, 2025 · 0 comments
Open
1 of 3 tasks

[PROTOCOL RFC] Checkpoint Protection Up To Version #4152

andreaschat-db opened this issue Feb 13, 2025 · 0 comments

Comments

@andreaschat-db
Copy link
Contributor

andreaschat-db commented Feb 13, 2025

Protocol Change Request

Overview

This feature request is about introducing the notion of checkpoint protection from cleanup operations up to a particular version. No checkpoint removal/creation before that version is allowed unless everything is cleaned up in one go. This feature can be used as a building block for dropping features without needing to truncate history.

Motivation

Today, dropping a feature requires the execution of the DROP FEATURE command twice with a 24 hour waiting time in between. In addition, it also results in the truncation of the history of the Delta table to the last 24 hours.

We can improve this process by introducing checkpointProtection, which allows us to set up the table's history (including checkpoints) in such a way that older readers will be able to handle it correctly until we atomically delete it.

A key component of this solution is a special set of protected checkpoints at the DROP FEATURE boundary that are guaranteed to persist until all history is truncated up to the checkpoints in one go. These checkpoints act as barriers that hide unsupported log records behind them. With the checkpointProtection, we can guarantee these checkpoints will persist until history is truncated.

Furthermore, with the new drop feature method, validating against the latest protocol is no longer sufficient. Therefore, creating checkpoints to historical versions can lead to corruption if the writer does not support the target protocol. The checkpointProtection also protects against these cases by disallowing checkpoint creation before requireCheckpointProtectionBeforeVersion.

With these changes, we can drop table features in a single command without needing to truncate history. More importantly, they simplify the drop feature user journey by requiring a single execution of the DROP FEATURE command.

Details

The checkpointProtection is a Writer feature that enforces writers to cleanup metadata iff metadata can be cleaned up to the requireCheckpointProtectionBeforeVersion table property in one go. This means that a single cleanup operation should truncate up to requireCheckpointProtectionBeforeVersion as opposed to several cleanup operations truncating in chunks.

The are two exceptions to this rule. If any of the two holds, the rule above can be ignored:

a) The writer verifies it supports all protocols between [start, min(requireCheckpointProtectionBeforeVersion, targetCleanupVersion)].
b) The writer does not create any checkpoints during history cleanup and does not erase any checkpoints after the truncation version.

The checkpointProtection can only be removed if history is truncated up to at least the requireCheckpointProtectionBeforeVersion.

Willingness to contribute

The Delta Lake Community encourages protocol innovations. Would you or another member of your organization be willing to contribute this feature to the Delta Lake code base?

  • Yes. I can contribute.
  • Yes. I would be willing to contribute with guidance from the Delta Lake community.
  • No. I cannot contribute at this time.
@andreaschat-db andreaschat-db changed the title [PROTOCOL RFC] Checkpoint Protection up to a Version [PROTOCOL RFC] Checkpoint Protection up to Version Feb 13, 2025
@andreaschat-db andreaschat-db changed the title [PROTOCOL RFC] Checkpoint Protection up to Version [PROTOCOL RFC] Checkpoint Protection Up To Version Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant