Add map input/output type for processes #2127

multimeric · 2021-05-24T00:06:20Z

New feature

I would like to be able to produce a single channel with multiple named values (ie a Map) from my processes, ie my_proccess.out.view() should return:

[a:abc, b:123, c:false]
[a:abc, b:123, c:false]
[a:def, b:456, c:true]

Currently we have a tuple qualifier, which creates an output channel which is a tuple of other types. However this tuple is unlabelled, so users have to extract values from this channel by position, which results in confusing code.

It is also possible to produce multiple output channels, each of which has its own name. However, these channels can't easily be combined into a single channel containing maps or tuples, because the merge operator has been deprecated, and in general joining channels by position is discouraged.

Usage scenario

This would be useful when a user is working with mostly map data in their channels, likely because they want each field to be labelled instead of unlabelled as in a tuple.

Suggest implementation

I would envisage a new map qualifier, which is used like this:

process hmmer_search {
    container "quay.io/biocontainers/hmmer:3.3.2--h1b792b2_1"
    input:
      path profile
      path database
    output:
        map [table: path('table.txt'), human_readable: path('match.txt')]
    script:
      """
      hmmsearch -o match.txt --tblout table.txt ${profile} ${database} 
      """
}

The text was updated successfully, but these errors were encountered:

stale · 2021-10-21T02:57:53Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

christopher-hardy · 2022-02-08T19:12:41Z

+1

bentsherman · 2022-07-13T22:41:04Z

Quick workaround is to output a tuple and follow up with a map operator that converts each tuple to a map.

bentsherman · 2022-07-18T16:23:34Z

This issue is complementary to #2257, which is about multiple named channels whereas this issue is about named values within a channel (e.g. map). Ideally both use cases should be supported.

A single map channel would be used for 1-to-1 relationships whereas multiple named channels would be used for 1-to-many and many-to-many relationships.

notestaff · 2022-12-10T18:53:20Z

+1

Or maybe immutable named tuples? E.g.
output:
tuple table: path('table.txt'), human_readable: path('match.txt')

Then all current tuple-reliant functionality (e.g. groupKey) works as before, but one can use names instead of indices when manipulating process results, e.g.

input_ch
| MY_PROC
| map { it.table }

instead of

| map { it[0] }

stale bot added the stale label Oct 21, 2021

stale bot closed this as completed Dec 20, 2021

bentsherman added lang/dsl2 pinned and removed stale labels Jul 18, 2022

bentsherman reopened this Jul 18, 2022

bentsherman changed the title ~~Allow map output from a process~~ Add map input/output type for processes Jul 18, 2022

bentsherman added lang/processes and removed lang/dsl2 labels Aug 8, 2022

notestaff mentioned this issue Dec 22, 2022

Optional input arguments, default value and named arguments. #3507

Open

bentsherman mentioned this issue Sep 15, 2023

Map type for process inputs/outputs #4313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add map input/output type for processes #2127

Add map input/output type for processes #2127

multimeric commented May 24, 2021

stale bot commented Oct 21, 2021

christopher-hardy commented Feb 8, 2022

bentsherman commented Jul 13, 2022

bentsherman commented Jul 18, 2022

notestaff commented Dec 10, 2022

Add map input/output type for processes #2127

Add map input/output type for processes #2127

Comments

multimeric commented May 24, 2021

New feature

Usage scenario

Suggest implementation

stale bot commented Oct 21, 2021

christopher-hardy commented Feb 8, 2022

bentsherman commented Jul 13, 2022

bentsherman commented Jul 18, 2022

notestaff commented Dec 10, 2022