Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-3272] Complete data expiration based on partition information #3273

Merged
merged 19 commits into from
Feb 26, 2025

Conversation

lintingbin
Copy link
Contributor

Why are the changes needed?

Close #3272.

Brief change log

  • When the expiration field is the partition field and the expiration level is partition, prioritize using the partition information of the datafile to expire the data files.

  • Modify the expected results of a test case, which is slightly different from the previous implementation. In partition expiration, date comparison is needed, and data files with the same date should not be expired.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added the module:ams-server Ams server module label Oct 17, 2024
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Nov 30, 2024
Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lintingbin Thanks for the contribution, I left some comments, please take a look when you're free thanks.

@github-actions github-actions bot removed the stale label Dec 4, 2024
@Aireed Aireed added this to the Release 0.8.0 milestone Dec 4, 2024
@Aireed Aireed mentioned this pull request Dec 4, 2024
33 tasks
@lintingbin
Copy link
Contributor Author

@klion26 I have already responded and made modifications. Please help review it again.

@klion26
Copy link
Member

klion26 commented Dec 4, 2024

@lintingbin thanks for rebasing the comments, the change LGTM. let's see if there are any more comments from the community.

Do we need to modify the corresponding documents?

@lintingbin
Copy link
Contributor Author

@lintingbin thanks for rebasing the comments, the change LGTM. let's see if there are any more comments from the community.

Do we need to modify the corresponding documents?

There is no need to modify the documentation since there are no changes in the parameters.

@lintingbin
Copy link
Contributor Author

@XBaith I have refactored the code by removing some unnecessary parts and optimizing the naming of variables to enhance readability. I also added a check for cases where transform is Void. Please review the code again.

@XBaith
Copy link
Contributor

XBaith commented Dec 6, 2024

@XBaith I have refactored the code by removing some unnecessary parts and optimizing the naming of variables to enhance readability. I also added a check for cases where transform is Void. Please review the code again.

I don't see any unit test for that case. Do you mean you test locally for Void transform?

Copy link
Contributor

@XBaith XBaith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote some unit tests for a case involving expiring partitions after dropping the partition field. Unfortunately, the current procedure cannot handle this scenario correctly.

Here’s the partition level case:

  • I set the partition field as op_time and inserted some records.
  • Then, I removed the partition field before expiring the data.

Expected behavior: All records should be retained since the partition field has been removed.

Please fix this issue and add additional unit tests to cover similar scenarios. Thanks!

@XBaith
Copy link
Contributor

XBaith commented Dec 7, 2024

I wrote some unit tests for a case involving expiring partitions after dropping the partition field. Unfortunately, the current procedure cannot handle this scenario correctly.

Sorry for the mistake; this bug existed prior to this PR. I will raise another PR to address it.

Additionally, we should add more unit tests to cover vulnerabilities as much as possible. Thanks for your contribution!

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, LGTM

@lintingbin lintingbin requested a review from XBaith February 19, 2025 03:29
Copy link
Contributor

@XBaith XBaith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 73.17073% with 11 lines in your changes missing coverage. Please review.

Project coverage is 27.76%. Comparing base (63aab48) to head (d48a8d9).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
.../optimizing/maintainer/IcebergTableMaintainer.java 71.79% 7 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3273      +/-   ##
============================================
+ Coverage     21.59%   27.76%   +6.16%     
- Complexity     2353     3627    +1274     
============================================
  Files           431      603     +172     
  Lines         40347    49212    +8865     
  Branches       5711     6344     +633     
============================================
+ Hits           8712    13662    +4950     
- Misses        30903    34597    +3694     
- Partials        732      953     +221     
Flag Coverage Δ
core 27.76% <73.17%> (?)
trino ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lintingbin
Copy link
Contributor Author

@czy006 @XBaith Now that all the problems have been solved, please help to review the code again.

@@ -194,7 +196,8 @@ private void testUnKeyedPartitionLevel() {

List<Record> expected;
if (tableTestHelper().partitionSpec().isPartitioned()) {
if (expireByStringDate()) {
// retention time is 1 day, expire partitions that order than 2022-01-02
if (expireByStringDate() && isMetricsNotNone()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the same result even if metrics mode is none

@XBaith
Copy link
Contributor

XBaith commented Feb 26, 2025

I made minor code adjustments to ensure that the expiration behavior remains consistent when the expire field is of string type, preventing any confusion for users

@czy006 czy006 changed the title feature: data-expire by partition info [AMORO-3272] data-expire by partition info Feb 26, 2025
@XBaith XBaith merged commit 2ff66e5 into apache:master Feb 26, 2025
4 checks passed
@lintingbin lintingbin deleted the feature/data_expire_by_partition_info branch February 26, 2025 06:10
@zhoujinsong zhoujinsong changed the title [AMORO-3272] data-expire by partition info [AMORO-3272] Complete data expiration based on partition information Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ams-server Ams server module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement]: Support data expiration based on partition information
8 participants