-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script for seeding category data from S3 #959
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool stuff.. 🎉
data-workflows/utils/utils.py
Outdated
return round(time.time() * 1000) | ||
|
||
|
||
class S3Client: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this be a class is really clean. 😁
8fa36c7
to
542c6fd
Compare
import os | ||
|
||
|
||
def get_required_env(key: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool idea. 😀
542c6fd
to
22ecad6
Compare
from typing import Dict | ||
|
||
|
||
def run_workflow(event: Dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done! Having this workflow is super helpful for the future!
22ecad6
to
abc4af6
Compare
data-workflows/handler.py
Outdated
if not version: | ||
LOGGER.error(f"Missing 'version' for type={event_type}") | ||
return | ||
|
||
if not s3_path: | ||
LOGGER.error(f"Missing 's3_path' for type={event_type}") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
abc4af6
to
56e863b
Compare
BLOCKER: I believe we were holding off on incorporating changes from #986 until Kevin's ETL PR was merged, to avoid conflicts that might be harder to resolve on his end. |
4bfce5a
to
fbdb3e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is missing unit tests. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality LGTM. But, might require unit test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
#866
Depends on #986
This PR adds a script for seeding category data from the S3 bucket. This works by adding a general purpose
run_workflow.py
script that can be run manually to run a particular workflow from CLI, or by importing therun_workflow()
function. This will allow us to run the same scripts in our workflows locally and from the data-workflows lambda.To seed category data from S3, run the
run_workflow.py
script:For example, this is the command I used for testing:
The workflow can also be run from Python using:
Demos
List of first 50 categories in Dynamo