Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7351] Implement partition pushdown for glue #10604

Merged

Conversation

parisni
Copy link
Contributor

@parisni parisni commented Feb 1, 2024

Change Logs

This is a follow up of #10572

While the mentioned PR fixed the runtime error, it did not implement the logic for pushdown which was silently no returning partitions.

The current PR does so by:

  • refactor to enable one implementation of pushdown per sync client
  • provide a glue expression (diverge from hms)
  • reorder the partitions in case they are returned misordered by the metastore
  • optimize partition retrieval by missing the columns details
  • UT has been copied and adapted from hiveSync
  • IT has been introduced with moto to simulate aws env

Impact

Using this feature has proven faster metastore sync in case of large number of partitions, from couple of minutes to seconds.

Risk level (write none, low medium or high below)

Integration testing in aws environment has been done

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@parisni
Copy link
Contributor Author

parisni commented Feb 1, 2024

@danny0405 you were involved in the previous PR

@danny0405
Copy link
Contributor

There are no UT, can you ensure the functionality by offline e2e tests?

@parisni
Copy link
Contributor Author

parisni commented Feb 2, 2024

I e2e tested locally. We will land this patch next week in production so let me confirm it's all right then.
BTW I added a refacto let me know if it's better

@parisni
Copy link
Contributor Author

parisni commented Feb 2, 2024

BTW @danny0405 I am working on a POC for IT test with moto to improve the reliability of this module #10614

@parisni parisni force-pushed the feat/implement-gluepartition-pushdown branch from 421f87a to 7b94f04 Compare February 3, 2024 22:04
@parisni parisni force-pushed the feat/implement-gluepartition-pushdown branch from 7b94f04 to c63161e Compare February 3, 2024 22:51
@hudi-bot
Copy link

hudi-bot commented Feb 4, 2024

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@parisni
Copy link
Contributor Author

parisni commented Feb 4, 2024

@danny0405 Introduced IT tests for hudi-aws with dockerized moto here

@danny0405 danny0405 merged commit ff0e67f into apache:master Feb 4, 2024
30 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants