Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved configuration / resource reuse for S3 #915

Merged
merged 3 commits into from
Oct 6, 2023

Conversation

olsen232
Copy link
Collaborator

@olsen232 olsen232 commented Oct 5, 2023

Don't store S3 regions - we don't need them, and they could (occasionally) change over time,
if somebody carefully deletes and recreates a bucket. And, it's extra junk to store.

Learn the region that buckets are in, where possible.
Configure S3 the client to default to the region that the resource is in,
to save it potentially finding the region itself on a per-request basis.
Dont' reuse clients across multiple regions (for the config reason).
Don't reuse clients across multiple threads (S3 resources and sessions are not guaranteed to be thread safe).
Do reuse clients / resources / buckets otherwise - these will be cached until the command exits.

This is the also the groundwork for fetching tiles during WC checkout,
which will probably need to happen on multiple threads.

Related links:

#905

Checklist:

  • Have you reviewed your own change?
  • Have you included test(s)? (Existing tests pass.)
  • Have you updated the changelog?

Learn the region that buckets are in, where possible
Configure S3 the client to default to the region that the resource is in,
to save it potentially finding the region itself on a per-request basis
Dont' reuse clients across multiple regions (for the config reason)
Don't reuse clients across multiple threads (S3 resources and sessions
are not guaranteed to be thread safe)
Do reuse clients / resources / buckets in other cases -
these will be cached until the command exits

This is the also the groundwork for fetching tiles during WC checkout,
which will probably need to happen on multiple threads.
@olsen232 olsen232 requested review from craigds and rcoup October 5, 2023 04:01
try:
response = get_s3_client().head_bucket(Bucket=bucket)
return response["ResponseMetadata"]["HTTPHeaders"]["x-amz-bucket-region"]
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main situation this might fail in seems to be if we have GetObject but not ListBucket permissions - in which case we can fetch the tiles but not get the bucket details. Is that right?

Do we require the ListBucket perm elsewhere for read-only access anyway, or is it only required for importing tiles? If we already need it elsewhere we might as well drop the except clause here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would require it if the user tried to use wildcards but we wouldn't require it if they listed in full the object(s) that make up the dataset. I'm inclined to leave this in since it just makes Kart a bit more robust

kart/s3_util.py Outdated
@@ -12,37 +14,82 @@

# Utility functions for dealing with S3 - not yet launched.

# Lots of functions in this file look like this:
# >> @functools.lru_cache()
# >> def _get_some_thread_unsafe_s3_resource(region, thread_hash):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe write a better decorator that wraps this up so you don't have to do the weird thread stuff in two versions of every function, eg @thread_local_cache

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@olsen232 olsen232 merged commit 6aab400 into master Oct 6, 2023
@olsen232 olsen232 deleted the s3-resource-management branch October 6, 2023 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants