-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invoke plugins
lambda from backend for npe2
manifest discovery
#673
Conversation
…hub into dev-npe2-triggers
f7cf423
to
a904656
Compare
I've now let this branch run on all plugins in
The frustrating thing is that despite the lambda being configured not to retry, 12 of the 13 initially failing invocations just kept being reinvoked - more than 6 times?! I can't explain that, honestly. They just keep going? I will upgrade the memory allocation and see how that affects the remaining failing invocations, and investigate the 6 plugin difference missing in |
Increased memory size to 256MB and plugins are now still timing out (but memory usage reported at 200MB). Will update timeout to 2 min. |
Ok after those updates all lambdas completed successfully. We still have 12 plugins with no manifest metadata in Error message written to manifestThese might need to be retried now that I've updated memory size. napari-yolov5 - no space left on device (lambda memory error) Missing in npe2api - legitimate errorsskeleton-finder Actually 'empty' manifestsnapari-error-reporter - napari package, not a plugin Potential fetch errornapari-mat-images - does have a hook_implementation, fetch finds no contributions |
@DragaDoncila Thanks so much for these updates! Regarding the 12 plugins with no manifest data/ that failed, did the retries (even though they're not supposed to happen at all) eventually settle to around 6 times or are they still being retried? Also - let me know when this PR is ready for review! |
@richaagarwal just summarizing here from our discussion today. All lambdas ceased to be retried after I updated the resource allocations, as there were no external lambda errors e.g. timeouts or memory issues. The reason they were retrying is not default aws behavior, it was because our manifest files were never being written, so our cloudwatch cron job kept leading to the lambda being invoked. As discussed, this is not great as we're relying on file existence as a measure of idempotency. I've now updated the lambda handler to write an empty file at the beginning of discovery, and overwrite it with either an error message or the real manifest once discovery is complete. This means even an unsuccessful lambda termination doesn't lead to retry, as the file will exist. I've also updated the error messages returned from the API to reflect this. As a future piece of work if we don't like the fact that we're overwriting the file, we can update the python handler to time its own execution and write an error message just before the lambda would time out. Finally we discussed cleaning the For people's reference, the command to print all files that would be deleted is: aws s3 rm s3://napari-hub-dev/dev-npe2-triggers/cache/ --recursive --exclude "*" --include "*-manifest.json" --dryrun I've run this command in the dev branch bucket to watch another run of invocations on the full plugin list. This time a single plugin timed out ( Regardless, I think this PR is ready for review! |
small update: |
plugins
lambda from backend for npe2
manifest discoveryplugins
lambda from backend for npe2
manifest discovery
@DragaDoncila I left some comments, but after reviewing further, I realized there are a few things with the previous structure we had in place that might be problematic with this new code. As the PR currently is, the plugins lambda could theoretically be invoked outside of the cron job, which isn't ideal. We'd ideally avoid that possibility by separating out calls to the S3 cache vs. calls to It also looks like |
@richaagarwal thanks for the thorough review! I've updated the logic now so that I've verified that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me! Thanks so much for your thorough work & testing on this. The code comments really help with readability too 💛
I wonder if we should also add staging
to the feature flags here: https://github.com/chanzuckerberg/napari-hub/blob/main/frontend/src/utils/featureFlags.ts#L27. I'll leave it up to you if you want to add that before merging or not, or if you prefer to open a separate PR for that.
@richaagarwal I'll update the feature flag here to enable on |
Synced with @richaagarwal about failing test and as it seems unrelated, I will go ahead and merge this now! |
As described in #620 this PR updates the manifest API endpoint handler to directly invoke the
plugins
lambda when the manifest file for a plugin & version is not already cached.Check out the metadata for
napari-blossom
on the frontend:https://dev-npe2-triggers-frontend.dev.imaging.cziscience.com/plugins/napari-blossom