Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Streaming utils for zipping and reading/wiring to S3 #7186

Merged

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Feb 7, 2025

What do these changes do?

These bring a set of utilities that will allow us to create zip archives on the fly and stream it as it gets created to S3. The idea is to use constant amount or RAM and no disk space.

How does this work? A request to upload a zip archive to S3 is created. As chunks of this archives are requested by the uploader, the streaming zip utility requests chunks of files on the fly and compose the archive. It will provide pieces of the archive to the S3 uploaded as soon as they are available.

Have a look at /home/silenthk/work/pr-osparc-stream-zipping-of-s3-content/packages/aws-library/tests/test_s3_client.py::test_workflow_compress_s3_objects_and_local_files_in_a_single_archive_then_upload_to_s3 for a full working workflow.

Progress bar support has also been added. Progress is sent based on the data read from the input streams.

Bonus: renamed _filemanager.py which created confusion to filemanager_utils.py

Related issue/s

How to test

Dev-ops checklist

@GitHK GitHK added this to the Singularity milestone Feb 7, 2025
@GitHK GitHK self-assigned this Feb 7, 2025
Copy link

codecov bot commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 96.32353% with 5 lines in your changes missing coverage. Please review.

Project coverage is 87.04%. Comparing base (8754b76) to head (25372c7).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7186      +/-   ##
==========================================
+ Coverage   87.00%   87.04%   +0.03%     
==========================================
  Files        1667     1669       +2     
  Lines       64764    64768       +4     
  Branches     1096     1115      +19     
==========================================
+ Hits        56351    56380      +29     
+ Misses       8100     8070      -30     
- Partials      313      318       +5     
Flag Coverage Δ
integrationtests 65.26% <75.00%> (-0.01%) ⬇️
unittests 86.04% <93.38%> (+0.04%) ⬆️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library 94.17% <100.00%> (+0.14%) ⬆️
pkg_dask_task_models_library 97.09% <ø> (ø)
pkg_models_library 91.54% <100.00%> (+0.01%) ⬆️
pkg_notifications_library 84.57% <ø> (ø)
pkg_postgres_database 88.28% <ø> (ø)
pkg_service_integration 70.03% <ø> (ø)
pkg_service_library 72.61% <97.97%> (+0.45%) ⬆️
pkg_settings_library 90.61% <ø> (ø)
pkg_simcore_sdk 85.08% <75.00%> (-0.39%) ⬇️
agent 96.46% <ø> (ø)
api_server 90.56% <ø> (ø)
autoscaling 96.08% <ø> (ø)
catalog 91.71% <ø> (ø)
clusters_keeper 99.24% <ø> (ø)
dask_sidecar 91.25% <ø> (ø)
datcore_adapter 93.19% <ø> (ø)
director 76.68% <ø> (+0.09%) ⬆️
director_v2 91.29% <ø> (-0.02%) ⬇️
dynamic_scheduler 97.33% <ø> (ø)
dynamic_sidecar 89.77% <ø> (ø)
efs_guardian 90.25% <ø> (ø)
invitations 93.28% <ø> (ø)
osparc_gateway_server ∅ <ø> (∅)
payments 92.66% <ø> (ø)
resource_usage_tracker 89.24% <ø> (+0.05%) ⬆️
storage 86.61% <ø> (ø)
webclient ∅ <ø> (∅)
webserver 84.79% <ø> (-0.01%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8754b76...25372c7. Read the comment docs.

@GitHK GitHK added a:services-library issues on packages/service-libs a:aws-library labels Feb 7, 2025
@GitHK GitHK requested a review from sanderegg February 14, 2025 05:48
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last thing. please double check download_fileobj

@GitHK GitHK requested a review from sanderegg February 14, 2025 08:26
Copy link
Contributor

@bisgaard-itis bisgaard-itis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff! Thanks a lot for the effort! I would suggest to add some RAM checks and perhaps also some disk space checks to your tests.

@GitHK GitHK mentioned this pull request Feb 14, 2025
1 task
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@GitHK GitHK enabled auto-merge (squash) February 17, 2025 08:02
@GitHK GitHK merged commit 76ace06 into ITISFoundation:master Feb 17, 2025
92 of 95 checks passed
@GitHK GitHK deleted the pr-osparc-stream-zipping-of-s3-content branch February 17, 2025 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:aws-library a:services-library issues on packages/service-libs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants