Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add map input/output type for processes #2127

Open
multimeric opened this issue May 24, 2021 · 5 comments
Open

Add map input/output type for processes #2127

multimeric opened this issue May 24, 2021 · 5 comments

Comments

@multimeric
Copy link

New feature

I would like to be able to produce a single channel with multiple named values (ie a Map) from my processes, ie my_proccess.out.view() should return:

[a:abc, b:123, c:false]
[a:abc, b:123, c:false]
[a:def, b:456, c:true]

Currently we have a tuple qualifier, which creates an output channel which is a tuple of other types. However this tuple is unlabelled, so users have to extract values from this channel by position, which results in confusing code.

It is also possible to produce multiple output channels, each of which has its own name. However, these channels can't easily be combined into a single channel containing maps or tuples, because the merge operator has been deprecated, and in general joining channels by position is discouraged.

Usage scenario

This would be useful when a user is working with mostly map data in their channels, likely because they want each field to be labelled instead of unlabelled as in a tuple.

Suggest implementation

I would envisage a new map qualifier, which is used like this:

process hmmer_search {
    container "quay.io/biocontainers/hmmer:3.3.2--h1b792b2_1"
    input:
      path profile
      path database
    output:
        map [table: path('table.txt'), human_readable: path('match.txt')]
    script:
      """
      hmmsearch -o match.txt --tblout table.txt ${profile} ${database} 
      """
}
@stale
Copy link

stale bot commented Oct 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 21, 2021
@stale stale bot closed this as completed Dec 20, 2021
@christopher-hardy
Copy link

+1

@bentsherman
Copy link
Member

Quick workaround is to output a tuple and follow up with a map operator that converts each tuple to a map.

@bentsherman bentsherman reopened this Jul 18, 2022
@bentsherman bentsherman changed the title Allow map output from a process Add map input/output type for processes Jul 18, 2022
@bentsherman
Copy link
Member

This issue is complementary to #2257, which is about multiple named channels whereas this issue is about named values within a channel (e.g. map). Ideally both use cases should be supported.

A single map channel would be used for 1-to-1 relationships whereas multiple named channels would be used for 1-to-many and many-to-many relationships.

@notestaff
Copy link

+1

Or maybe immutable named tuples? E.g.
output:
tuple table: path('table.txt'), human_readable: path('match.txt')

Then all current tuple-reliant functionality (e.g. groupKey) works as before, but one can use names instead of indices when manipulating process results, e.g.

input_ch
| MY_PROC
| map { it.table }

instead of

| map { it[0] }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants