Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nullable input/output paths #4293

Closed
wants to merge 2 commits into from

Conversation

bentsherman
Copy link
Member

Close #2678

This PR contains the support for nullable paths via arity: '0..1', which was implemented in PR #3706 but ultimately excluded because it is tricky to implement.

With the new arity feature, you can set arity: '0..*' to specify that a path input/output should be a list and can be empty.

With this PR, you can also set arity: '0..1' to specify that a path input/output should be a single file but could also be null. Whereas optional: true emits nothing if no file is produced, this "nullable" path emits a null value which can be accepted by similarly nullable inputs.

The problem with the current implementation is that the null path is being passed through some code that it shouldn't, and produces this warning:

WARN: File system is unable to get file attributes file: nextflow.util.NullPath@7f -- Cause: java.nio.file.ProviderMismatchException

While it still works, ideally the null path should bypass any code that would try to use it like a real file.

Additionally, it should be considered whether nullable paths should be implemented with a separate option like nullable: true instead of adding it on to the arity feature, which might simplify the logic a bit.

@netlify
Copy link

netlify bot commented Sep 11, 2023

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit b7816a2
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/64ff55221a85a30008c64340

@bentsherman bentsherman changed the title Add nullable input/output path Nullable input/output paths Sep 11, 2023
@mjhipp
Copy link

mjhipp commented Oct 6, 2023

@bentsherman this is an interesting change.
Am I understanding correctly that with the newest edge release, you can use arity: '0..1' to get a list of path with max length 1? Essentially like an Option[Path] in languages like scala?

And with this PR, instead of a list of max length 1, it would be just a path object that may or may not be null?

@bentsherman
Copy link
Member Author

Yes. With arity: '0..1' it will emit nothing if there is no file, like optional: true. Whereas nullable would emit emit a "null" path that could trigger downstream tasks.

@mjhipp
Copy link

mjhipp commented Oct 6, 2023

When I use the new edge (not this PR), it says '0..1' is not allowed

N E X T F L O W  ~  version 23.09.0-edge
Launching `foo.nf` [sick_keller] DSL2 - revision: bbf58d593f
Path arity 0..1 is not allowed

 -- Check script 'foo.nf' at line: 3 or see '.nextflow.log' file for more details

When using '0..2', I am able to emit a channel (this works whether or not its wrapped in a tuple), that I can pass along to a downstream process.

process FOO {
  input:
    tuple path(bam), path(bai, arity: '0..2')
    val output // file name

  output:
    tuple path(output), path("${output - ~/.bam$/}.bai", arity: '0..2'), emit: bam_ch

  script:
    """
    touch $output
    ${bai ? "touch ${output - ~/.bam$/}.bai" : ''} 
    """
}

params.bai = [] // default to no index

workflow {
  FOO(
    tuple(params.bam, params.bai),
    'output.bam'
  )
  FOO.out.bam_ch.view { "bam/bai tuple: $it" }
}

When I run with an index, I get an index back (wrapped in a list)

❯ NXF_VER=23.09.0-edge ./nextflow run foo.nf --bam foo.bam --bai foo.bai
N E X T F L O W  ~  version 23.09.0-edge
Launching `foo.nf` [peaceful_hamilton] DSL2 - revision: a562747916
executor >  local (1)
executor >  local (1)
[8c/fb43ce] process > FOO [100%] 1 of 1 ✔
bam/bai tuple: [/Users/mikehipp/work/8c/fb43ce5ad904a1669594504f1c9d56/output.bam, [/Users/mikehipp/work/8c/fb43ce5ad904a1669594504f1c9d56/output.bai]]

When I run with no index, I get an empty list in place of the index, which can be used as an input downstream.

❯ NXF_VER=23.09.0-edge ./nextflow run foo.nf --bam foo.bam
N E X T F L O W  ~  version 23.09.0-edge
Launching `foo.nf` [loving_kalman] DSL2 - revision: 6b8955ef01
executor >  local (1)
[b5/cdcdcf] process > FOO [100%] 1 of 1 ✔
bam/bai tuple: [/Users/mikehipp/work/b5/cdcdcff85fcd0d5e0fe5cbf801ff89/output.bam, []]

Ideally and in my opinion, logically, I would think '0..1' should behave identically to '0..2' and '0..*' except it is limited to one file maximum. It can still output a channel that can be passed, except the channel is an empty list. Using an empty list for an optional file is how I have always handled this up to now.

It just seems that making 0..1 a special case that could be null is a bit of a confusing departure from the rest of the arity behavior.

@bentsherman
Copy link
Member Author

Sorry, I got some things mixed up.

Arity 0..1 is not allowed right now, as a temporary stopgap until we figure out how to handle nullability the right way. I don't think emitting an empty array arity is 0..1 is the "correct way", because the intent is that it's a single file that could be null, and you want to emit the "null" file.

On the other hand, an empty array would make things simple. The problem with the null file approach (implemented in this PR) is that Nextflow is trying to use it like a real file throughout the code, so we either have to re-route null files or add more no-op behavior to the NullPath class.

You might be able to do arity: '1', optional: true, but I think the current implementation will still throw an error if no file is produced.

@mjhipp
Copy link

mjhipp commented Oct 10, 2023

Thanks for the insight.

You mention that the intent is to have a path that may or may not be null. I would like to ask if the real intent is to have a nullable path, or if the intent is to have some way to allow an optional output to trigger downstream processes, whether or not it is present? If so, an empty list or a nullable path would accomplish that same intent?

I do believe, regardless of how it is implemented, the ability to use arity: '0..1', as well as the wider arity functionality, is a great feature and I look forward to using it in my workflows! Thank you for the discussion @bentsherman, this is all quite fascinating to me.

@bentsherman
Copy link
Member Author

It would. I just wonder if it could cause confusion with any dataflow logic between the two processes, where null and [] would have different behavior.

For example, if ( file ) would be false for both null and [], but if ( file != null ) would be different. So if a user wants to use the nullable file in some way, which almost always involves a null check, they will get the right or wrong behavior depending on how they do it...

@bentsherman
Copy link
Member Author

This is unlikely to be resolved before static types are implemented. I will try to fold nullable paths into #4553 , should be easier there anyway

@bentsherman
Copy link
Member Author

bentsherman commented Mar 29, 2024

Note to self on mapping static types to arity:

  • Path: arity = 1
  • Collection<Path>: arity = 0..*
  • Optional<Path>: arity = 0..1
  • record Pair { Path read1 ; Path read2 }: arity = 2, etc

So I think the arity option won't be relevant with static types + records because any arity can be represented with a particular type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DSL2 - emit tuples with optional values
2 participants