-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cram support + read splitting with seqkit for speedup #388
Conversation
|
will close #63 |
@FriederikeHanssen your branch is out-of-date with the base branch due to #390 |
ups, thanks for the heads up |
@maxulysse I cleaned up the code a bit now. there are still quite some things that need to be fixed/discussed in separate PRs. I'll add a collection here for context and turn it into cards on the project board: Things that need to still be fixed from PR #388
The CI tests are not passing anymore, since they are not bams but crams. I don't really know what is going on with the nf-core lint testing. the error message is a bit cryptic to me. I'll updraft it for you to take a look. then we can see what I should fix here before merge and what can wait for a later PR |
// Import generic module functions | ||
include { initOptions; saveFiles; getSoftwareName } from './functions' | ||
|
||
params.options = [:] | ||
options = initOptions(params.options) | ||
|
||
process INDEX_TARGET_BED { | ||
tag "$target_bed" | ||
label 'process_medium' | ||
publishDir "${params.outdir}", | ||
mode: params.publish_dir_mode, | ||
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } | ||
|
||
conda (params.enable_conda ? "bioconda::htslib=1.12" : null) | ||
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { | ||
//TODO: No singularity container at the moment, use docker container for the moment | ||
container "quay.io/biocontainers/htslib:1.12--h9093b5e_1" | ||
} else { | ||
container "quay.io/biocontainers/htslib:1.12--hd3b49d5_0" | ||
} | ||
|
||
input: | ||
path target_bed | ||
|
||
output: | ||
tuple path("${target_bed}.gz"), path("${target_bed}.gz.tbi") | ||
|
||
script: | ||
""" | ||
bgzip --threads ${task.cpus} -c ${target_bed} > ${target_bed}.gz | ||
tabix ${target_bed}.gz | ||
""" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that in nf-core/modules
try { | ||
includeConfig 'conf/base.config' | ||
} catch (Exception e) { | ||
System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/base.config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note sure about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, yeah I had problems were the error messsage for failing to load configs was very confusing. But maybe this is also something to deal with upstream and not in sarek
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worry, we'll fix that
try { | ||
includeConfig 'conf/modules.config' | ||
} catch (Exception e) { | ||
System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/modules.config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it
BAMQC is still a bit of an open problem, especially after base recalibration....
Important change: BamQC + Samtools Stats is only done ONCE BEFORE BaseRecalibration right now: if duplicates are marked it is run after duplicate marking, if they are not it is run after mapping. This reduces the runtime, since MarkDuplicates can take care of merging split reads internally without runtime punishments.
Spark implementation currently only works with singularity. The docker image has issue, and would have to possibly be rebuild :(
PR checklist
scrape_software_versions.py
nf-core lint .
).nextflow run . -profile test,docker
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).