[FEATURE] Have non blocking channel out of bwamem #362

maxulysse · 2021-03-22T14:24:00Z

Copied the discussion from #358 for context.

Basically what would be good is to have non-blocking channel our of the mapping step, so that following steps can happen while mapping is still ongoing.

from @rogerzhanglijie

how to resolve the groupTuple block problems in line 78, because I can not comment on that line, so I comment here,
if the mapping step has multiple runs, groupTuple will block all bam files, until all samples have the aligned results, however, we can not set size parameter in groupTuple, because we don't know how many runs in one sample, there may be 2,3,4,.... and so on.

I don't see a problem here actually.
the reads_input.groupTuple(by: [0,1]) ensure that we have only one fastq pairs in the channel emission.
We group the different bams from the same samples with this:
MERGE_BAM(bam_bwa_multiple) 
 bam_mapped       = bam_bwa_single.mix(MERGE_BAM.out.bam) 
But maybe I misunderstood your comment.
Can you elaborate more if it was the case?

What I mean is in this part of groupTuple , this part is grouping the align result bams according to bam files size, not the input fastq files. because you need to group the output bam according to bam files size, bam_bwa_to_sort channel will not emit element until all aligned bam has finished. what's more, there is no set 'size' parameter in groupTuple operator, so the groupTuple will block until all samples aligned bam finished in bwa-mem, because groupTuple is a block operator if you don't set 'size' parameter.

ok, I'll do some more tests for that.
I copied part of this code from dsl1 and tried to update it, it's definitively possible that I created issues.
Thanks for noticing that

Looking forward to your reply as soon as possible, I have come across the same problem in my code, but I don't know how to solve this problem, because the size numbers are Dynamically changing values for different samples which multiple run data.

Ok, re-read and tested things, and I understand what you want.
Sorry, I was still in a no coffee mode this morning.

The text was updated successfully, but these errors were encountered:

rogerzhanglijie · 2021-03-24T06:47:55Z

can we split the input reads into single and multiple run before align? because if group the bam files after align, the group tuple will block and wait all samples had finished align step (containing single and multiple run)

maxulysse · 2021-03-24T08:26:31Z

I'm planning of rewriting the part that parse the tsv file(s) from the input data and write the meta map.
I just though this morning that it could be an idea to write in the meta map the number of lane (ie fastq pairs) for the sample.
That way, we'll know already without having to do any channel manipulation.
Still thinking of others way.
I'll consider your ideas too.
Thanks a lot for spotting this issue, and explaining it to me

FriederikeHanssen · 2021-07-15T15:13:05Z

Addressed in PR #388 by using groupTuple with previous known size. Workflow no advances whenever all parts for one patient are mapped

maxulysse added the enhancement New feature or request label Mar 22, 2021

maxulysse self-assigned this Mar 22, 2021

maxulysse mentioned this issue Mar 22, 2021

Update dsl2 modules #358

Merged

11 tasks

FriederikeHanssen self-assigned this Jul 15, 2021

FriederikeHanssen closed this as completed Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Have non blocking channel out of bwamem #362

[FEATURE] Have non blocking channel out of bwamem #362

maxulysse commented Mar 22, 2021

rogerzhanglijie commented Mar 24, 2021

maxulysse commented Mar 24, 2021

FriederikeHanssen commented Jul 15, 2021

[FEATURE] Have non blocking channel out of bwamem #362

[FEATURE] Have non blocking channel out of bwamem #362

Comments

maxulysse commented Mar 22, 2021

rogerzhanglijie commented Mar 24, 2021

maxulysse commented Mar 24, 2021

FriederikeHanssen commented Jul 15, 2021