-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nextflow needs to keep track of the Wave containers it puts in our ECR #5801
Comments
You are right that currently there isn't an simple way to collect wave build logs for a nextflow execution. A possible workaround using some Bash wrangler is the following:
I'm including a Bash snippet for your convenience. requestId=$1
buildId=$(curl https://wave.seqera.io/v1alpha2/container/${requestId} | jq .buildId -r)
curl https://wave.seqera.io/v1alpha1/builds/${buildId}/logs > wave-${buildId}.log Said that, to improve this experience, the plan is to collect all containers metadata in Seqera Platform both for workflow launched using Nextflow CLI and via Platform itself. This will make it possible to collect and search for all containers, builds and vulnerabilities scan in a single place for all workflow runs, either using the Platform UI or command line tool. There's already a PR open on Nextflow to bring this metadata to Seqera Platform. Let us know know if this solution works for you. |
Thanks for this great reply, I was not aware that the part of the wave URI after the I do have concerns still about making Seqera Platform the solution for this instead of Nextflow itself. Because there are going to be users who dont have or want access to Platform who still need this feature to occur to satisfy e.g. Infosec requirements. So even if this is "solved" in Platform, its not actually a fix for Nextflow, and risks putting non-Platform users in a non-compliant state with company policies. Its also not really feasible to police users' command-line invocations of Nextflow either for this purpose to make sure they include the |
We can currently use the built-in features with Wave to deposit a copy of the final built Wave container into our private ECR
https://www.nextflow.io/docs/latest/reference/config.html#wave
and so when Nextflow runs with Wave + Fusion enabled, if the original container for a Nextflow process was something like this
And then Wave updates the container for usage with Fusion, the resulting container URI that is recorded in the workflow execution is this;
However, the contents of our ECR's
wave/build
repo ends up with containers that look like this;We end up with a bunch of containers pushed into our ECR with no way to identify them. And we have no way to know what the origin of these containers was.
We need to be able to determine what the original public Wave container was that was used when these containers were deposited in our ECR for provenance tracking requirements.
A simple solution for this would be for Nextflow to save the wave "build logs" for the Wave containers as they are being generated. Alternatively, and perhaps an easier solution, would be for Nextflow to just keep a log of the original Wave container URI and its build_id when its using these Wave containers. Right now it does not seem like either of these things is happening.
Based on the current Wave API docs listed here https://docs.seqera.io/wave/api#get-v1alpha1buildsbuildidlogs it seems like its impossible to determine any actual details about a Wave container after its been built by Nextflow, because all queries appear to require the build_id, which Nextflow does not appear to be saving, despite the fact that it clearly should have the build_id at runtime because the build_id is part of the Wave API response when its accepting requests to build the containers.
So if Nextflow is not saving the details of the URI's of the public Wave containers it is generated, and its not saving the build_id for these containers, and its not tracking the URI of the copies of the containers its depositing in our ECR, it seems like it should do that so we can tell which containers in our ECR are which and where they originated from.
Thanks!
The text was updated successfully, but these errors were encountered: