Invoke `plugins` lambda from backend for `npe2` manifest discovery #673

DragaDoncila · 2022-10-03T20:53:12Z

As described in #620 this PR updates the manifest API endpoint handler to directly invoke the plugins lambda when the manifest file for a plugin & version is not already cached.

Check out the metadata for napari-blossom on the frontend:
https://dev-npe2-triggers-frontend.dev.imaging.cziscience.com/plugins/napari-blossom

…hub into dev-npe2-triggers

DragaDoncila · 2022-10-07T08:45:48Z

I've now let this branch run on all plugins in dev-npe2-triggers for the first time. Just wanted to leave a few notes here:

The initial cloudwatch update triggered 235 lambda invocations - this is consistent with the number of currently listed plugins (232 - I have no idea where the extra 3 come from...).
Of 235 invocations, 13 errored. All errors were timeout/max memory usage errors from the lambda. This was semi-expected as I had just lowered the resource allocations to their minimums for testing purposes. All failing plugins do not provide a wheel distribution, so the fetch process has to build it (hence the increased resource requirements).
Even with the 13 invocations timing out at 60 seconds, the average duration of the initial run of 235 invocations was 4.3s.
After ~25 minutes, all but 19 plugins had complete metadata information in index.json - roughly consistent with the 13 errored lambdas, though I will look into the other 6.

The frustrating thing is that despite the lambda being configured not to retry, 12 of the 13 initially failing invocations just kept being reinvoked - more than 6 times?! I can't explain that, honestly. They just keep going?

I will upgrade the memory allocation and see how that affects the remaining failing invocations, and investigate the 6 plugin difference missing in index.json.

DragaDoncila · 2022-10-07T09:01:03Z

Increased memory size to 256MB and plugins are now still timing out (but memory usage reported at 200MB). Will update timeout to 2 min.

DragaDoncila · 2022-10-07T09:39:18Z

Ok after those updates all lambdas completed successfully. We still have 12 plugins with no manifest metadata in index.json and I've summarized them below:

Error message written to manifest

These might need to be retried now that I've updated memory size.

napari-yolov5 - no space left on device (lambda memory error)
platelet-unet-watershed - [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/var/task']
zarpaint - [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/var/task']
nfinder - [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/var/task']

Missing in npe2api - legitimate errors

skeleton-finder
napari-mri

Actually 'empty' manifests

napari-error-reporter - napari package, not a plugin
napari-console - napari package, not a plugin
beetlesafari - accurate, doesn't return any hook implementations
devbio-napari - accurate, napari tools menu stuff

Potential fetch error

napari-mat-images - does have a hook_implementation, fetch finds no contributions
napari-features - does have a hook_implementation, fetch finds no contributions

richaagarwal · 2022-10-10T22:00:46Z

@DragaDoncila Thanks so much for these updates! Regarding the 12 plugins with no manifest data/ that failed, did the retries (even though they're not supposed to happen at all) eventually settle to around 6 times or are they still being retried?

Also - let me know when this PR is ready for review!

DragaDoncila · 2022-10-12T07:56:08Z

Regarding the 12 plugins with no manifest data/ that failed, did the retries (even though they're not supposed to happen at all) eventually settle to around 6 times or are they still being retried?

@richaagarwal just summarizing here from our discussion today. All lambdas ceased to be retried after I updated the resource allocations, as there were no external lambda errors e.g. timeouts or memory issues. The reason they were retrying is not default aws behavior, it was because our manifest files were never being written, so our cloudwatch cron job kept leading to the lambda being invoked.

As discussed, this is not great as we're relying on file existence as a measure of idempotency. I've now updated the lambda handler to write an empty file at the beginning of discovery, and overwrite it with either an error message or the real manifest once discovery is complete. This means even an unsuccessful lambda termination doesn't lead to retry, as the file will exist. I've also updated the error messages returned from the API to reflect this.

As a future piece of work if we don't like the fact that we're overwriting the file, we can update the python handler to time its own execution and write an error message just before the lambda would time out.

Finally we discussed cleaning the prod and staging environments of existing manifest files placed there by the original system, and we decided it would be fine to manually run a command to clear these out from staging just before we merge this PR, and from prod soon after. This will allow the new method to populate all manifests correctly.

For people's reference, the command to print all files that would be deleted is:

aws s3 rm s3://napari-hub-dev/dev-npe2-triggers/cache/ --recursive --exclude "*" --include "*-manifest.json" --dryrun

I've run this command in the dev branch bucket to watch another run of invocations on the full plugin list. This time a single plugin timed out (napari-yolov5) while attempting to build the wheel, and I've confirmed that the plugins lambda doesn't get invoked again. I've updated the timeout to 2.5min (as a subsequent run for this plugin almost finished), and hopefully that will take care of this last plugin.

Regardless, I think this PR is ready for review!

DragaDoncila · 2022-10-12T08:04:47Z

small update: napari-yolov5 finished successfully in 124s - so close 😆

backend/api/model.py

richaagarwal · 2022-10-12T22:19:27Z

@DragaDoncila I left some comments, but after reviewing further, I realized there are a few things with the previous structure we had in place that might be problematic with this new code.

As the PR currently is, the plugins lambda could theoretically be invoked outside of the cron job, which isn't ideal. We'd ideally avoid that possibility by separating out calls to the S3 cache vs. calls to get_manifest, since get_manifest is now directly invoking the plugins lambda.

It also looks like get_frontend_manifest_metadata will need to be updated regardless since it's still referencing the old process_count.

DragaDoncila · 2022-10-13T05:53:09Z

@richaagarwal thanks for the thorough review! I've updated the logic now so that get_manifest just retrieves the manifest from cache. I've added a new method discover_manifest which is only called from build_manifest_metadata when get_manifest returns a "Manifest not yet processed" message.

I've verified that the plugins lambda is no longer invoked when, for example, we make a call to manifest/{name} in the api, but that it is successfully invoked during the cron job.

backend/api/app.py

richaagarwal

This looks great to me! Thanks so much for your thorough work & testing on this. The code comments really help with readability too 💛

I wonder if we should also add staging to the feature flags here: https://github.com/chanzuckerberg/napari-hub/blob/main/frontend/src/utils/featureFlags.ts#L27. I'll leave it up to you if you want to add that before merging or not, or if you prefer to open a separate PR for that.

DragaDoncila · 2022-10-13T23:07:32Z

@richaagarwal I'll update the feature flag here to enable on staging. Before I merge I'll also delete the -manifest.json files on staging.

DragaDoncila · 2022-10-14T20:41:09Z

Synced with @richaagarwal about failing test and as it seems unrelated, I will go ahead and merge this now!

DragaDoncila added 28 commits September 30, 2022 15:32

First deploy inovking plugins from backend

6bfefbd

First deploy inovking plugins from backend

0a6d6d7

Merge branch 'dev-npe2-triggers' of github.com:chanzuckerberg/napari-…

f917d2c

…hub into dev-npe2-triggers

move env var declaration and use json.dumps for payload

ed936b3

Update manfiest writing path

15a2e00

Fix tests

4c885d4

Add copious print statements

f153679

Convert generator to list

302ad7f

Add more printing

6ba70ce

Add listobjects permission

648cbbd

Add early return until I figure out s3 permissions

eb112a6

Try adding listobjects on bucket itself.

908188b

Try ListBucket

9c22c7d

Use bucket for put object

1289880

Update print statements

f7cf423

First deploy inovking plugins from backend

d46d21f

move env var declaration and use json.dumps for payload

d325a8b

Update manfiest writing path

8d91ca5

Fix tests

9562c8f

Add copious print statements

0945774

Convert generator to list

fed5eb4

Add more printing

758dc73

Add listobjects permission

73c3a79

Add early return until I figure out s3 permissions

1a29541

Try adding listobjects on bucket itself.

59d013d

Try ListBucket

7c0041c

Use bucket for put object

da29692

Update print statements

a904656

richaagarwal force-pushed the dev-npe2-triggers branch from f7cf423 to a904656 Compare October 5, 2022 22:51

Limit lambda resources

c00ca7e

Give more memory, update print statement

a710875

Update timeout to 2 minutes

1e1be43

DragaDoncila added 5 commits October 12, 2022 13:18

Write empty file at beginning of processing, update api messages

0dad303

Fix tests

26f68bd

Add test dep

63e56fa

dont actually need that dep

a743680

Change timeout back to original 2.5 min

7829a2a

DragaDoncila marked this pull request as ready for review October 12, 2022 07:56

DragaDoncila requested a review from richaagarwal October 12, 2022 07:56

DragaDoncila changed the title ~~[WIP] Invoke plugins lambda from backend for npe2 manifest discovery~~ Invoke plugins lambda from backend for npe2 manifest discovery Oct 12, 2022

richaagarwal reviewed Oct 12, 2022

View reviewed changes

backend/api/model.py Outdated Show resolved Hide resolved

richaagarwal reviewed Oct 12, 2022

View reviewed changes

backend/api/model.py Outdated Show resolved Hide resolved

DragaDoncila added 2 commits October 13, 2022 16:23

Refactor discovery invocation

dfb6816

Fix bug with lambda invocation condition

ce40527

DragaDoncila requested a review from richaagarwal October 13, 2022 05:53

DragaDoncila commented Oct 13, 2022

View reviewed changes

backend/api/app.py Show resolved Hide resolved

richaagarwal approved these changes Oct 13, 2022

View reviewed changes

Turn on npe2 feature for staging

5a5815e

DragaDoncila merged commit 8143eff into main Oct 14, 2022

DragaDoncila deleted the dev-npe2-triggers branch October 14, 2022 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invoke `plugins` lambda from backend for `npe2` manifest discovery #673

Invoke `plugins` lambda from backend for `npe2` manifest discovery #673

DragaDoncila commented Oct 3, 2022 •

edited

Loading

DragaDoncila commented Oct 7, 2022

DragaDoncila commented Oct 7, 2022

DragaDoncila commented Oct 7, 2022

richaagarwal commented Oct 10, 2022

DragaDoncila commented Oct 12, 2022

DragaDoncila commented Oct 12, 2022

richaagarwal commented Oct 12, 2022 •

edited

Loading

DragaDoncila commented Oct 13, 2022

richaagarwal left a comment

DragaDoncila commented Oct 13, 2022

DragaDoncila commented Oct 14, 2022

Invoke plugins lambda from backend for npe2 manifest discovery #673

Invoke plugins lambda from backend for npe2 manifest discovery #673

Conversation

DragaDoncila commented Oct 3, 2022 • edited Loading

DragaDoncila commented Oct 7, 2022

DragaDoncila commented Oct 7, 2022

DragaDoncila commented Oct 7, 2022

Error message written to manifest

Missing in npe2api - legitimate errors

Actually 'empty' manifests

Potential fetch error

richaagarwal commented Oct 10, 2022

DragaDoncila commented Oct 12, 2022

DragaDoncila commented Oct 12, 2022

richaagarwal commented Oct 12, 2022 • edited Loading

DragaDoncila commented Oct 13, 2022

richaagarwal left a comment

Choose a reason for hiding this comment

DragaDoncila commented Oct 13, 2022

DragaDoncila commented Oct 14, 2022

Invoke `plugins` lambda from backend for `npe2` manifest discovery #673

Invoke `plugins` lambda from backend for `npe2` manifest discovery #673

DragaDoncila commented Oct 3, 2022 •

edited

Loading

richaagarwal commented Oct 12, 2022 •

edited

Loading