Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ControlFREEC work for WES and WGS data in both paired and tumor-only modes #302

Merged
merged 28 commits into from
Dec 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3909d8f
Remove mappability option for ControlFREEC
jfnavarro Nov 7, 2020
90a9587
Remove mappability option for ControlFREEC
jfnavarro Nov 7, 2020
d838905
Remove mappability for ControlFREEC
jfnavarro Nov 7, 2020
864c7e8
Remove mappability option for ControlFREEC
jfnavarro Nov 7, 2020
6911bf3
Make ControlFREEC compatible with WES
jfnavarro Nov 12, 2020
04fb29a
Better support for exome data in ControlFREEC
jfnavarro Nov 15, 2020
bc17ae9
Bump ControlFREEC version and fix a bug with the output file names of…
jfnavarro Nov 17, 2020
5091752
Fix a couple of typos
jfnavarro Nov 18, 2020
22b18c7
Fixing minor bugs in the parameters for ControlFREEC
jfnavarro Nov 18, 2020
fcaca32
Update docs with new ControlFREEC configuration
jfnavarro Nov 18, 2020
31aaefe
Fixed a small typo
jfnavarro Nov 18, 2020
62fd810
Remove unused setting in ControlFREEC
jfnavarro Nov 18, 2020
c1eef46
More fixes to make ControlFREEC work
jfnavarro Nov 18, 2020
47de309
Fix a bug in the ControlFREECViz process taking wrong channel
jfnavarro Nov 19, 2020
b0a0c76
Ensure the right channel in ControlFREECViz
jfnavarro Nov 19, 2020
dd2d403
Fix a bug in the ControlFREECViz process
jfnavarro Nov 19, 2020
9d01fd3
Fix a bug in ControlFReecViz
jfnavarro Nov 20, 2020
acd0ea5
Restored mappability for Control-FREEC. Changed subclone option for W…
jfnavarro Dec 10, 2020
c8ae43d
Restoring igenomes mappability for the mouse reference
jfnavarro Dec 10, 2020
4fac08c
Update usage.md
jfnavarro Dec 10, 2020
dbd8537
Update nextflow_schema.json
jfnavarro Dec 10, 2020
45f3502
Add ControlFREEC process for tumor-only samples
jfnavarro Dec 10, 2020
687caa9
Merge branch 'dev' of github.com:jfnavarro/sarek into dev
jfnavarro Dec 10, 2020
0774484
Change name of mappability variable in ControlFREEC
jfnavarro Dec 10, 2020
fdca828
Fix an issue with the duplication of the channel in ControlFREECSingle
jfnavarro Dec 10, 2020
0cc2d82
Fix an error with comments in ControlFREEC
jfnavarro Dec 10, 2020
4e3202a
Disabling subclone computation in ControlFREEC as it is still beta an…
jfnavarro Dec 11, 2020
ce5cc36
ControlFreecViz works regardless of the output format in the CNVs fil…
jfnavarro Dec 18, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -978,7 +978,7 @@ Requires that [`--ascat_ploidy`](#--ascat_ploidy) is set.

Use this parameter to overwrite default behavior from `Control-FREEC` regarding `coefficientOfVariation`

Default: `0.015`
Default: `0.05`

### --cf_ploidy

Expand All @@ -989,6 +989,7 @@ Default: `2`
### --cf_window

Use this parameter to overwrite default behavior from `Control-FREEC` regarding `window size`
It is recommended to use a window size of 0 for exome data.

Default: Disabled

Expand Down Expand Up @@ -1423,7 +1424,7 @@ If you prefer, you can specify the full path to your reference genome when you r

### --mappability

If you prefer, you can specify the full path to your reference genome when you run the pipeline:
If you prefer, you can specify the full path to your Control-FREEC mappability when you run the pipeline:

```bash
--mappability <path/to/reference.gem>
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ dependencies:
- bioconda::bwa=0.7.17
- bioconda::cancerit-allelecount=4.0.2
- bioconda::cnvkit=0.9.6
- bioconda::control-freec=11.5
- bioconda::control-freec=11.6
- bioconda::ensembl-vep=99.2
- bioconda::fastqc=0.11.9
- bioconda::fgbio=1.1.0
Expand Down
172 changes: 145 additions & 27 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def helpMessage() {
-profile [str] Configuration profile to use. Can use multiple (comma separated)
Available: conda, docker, singularity, test, awsbatch, <institute> and more
--step [list] Specify starting step (only one)
Available: mapping, prepare_recalibration, recalibrate, variant_calling, annotate, Control-FREEC
Available: mapping, prepare_recalibration, recalibrate, variant_calling, annotate, ControlFREEC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Available: mapping, prepare_recalibration, recalibrate, variant_calling, annotate, ControlFREEC
Available: mapping, prepare_recalibration, recalibrate, variant_calling, annotate, Control-FREEC

Actually as for tools, the string is stripped out of _ and - and lowercased, so Control-FREEC should actually work.
I would prefer that spelling as it's the one from the tool itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Default: ${params.step}
--genome [str] Name of iGenomes reference
Default: ${params.genome}
Expand Down Expand Up @@ -89,7 +89,7 @@ def helpMessage() {
Requires that --ascat_ploidy is set
--cf_coeff [str] Control-FREEC coefficientOfVariation
Default: ${params.cf_coeff}
--cf_ploidy [int] Control-FREEC ploidy
--cf_ploidy [str] Control-FREEC ploidy
Default: ${params.cf_ploidy}
--cf_window [int] Control-FREEC window size
Default: Disabled
Expand Down Expand Up @@ -219,7 +219,6 @@ if (!checkParameterList(annotate_tools,annoList)) exit 1, 'Unknown tool(s) to an

// Check parameters
if ((params.ascat_ploidy && !params.ascat_purity) || (!params.ascat_ploidy && params.ascat_purity)) exit 1, 'Please specify both --ascat_purity and --ascat_ploidy, or none of them'
if (params.cf_window && params.cf_coeff) exit 1, 'Please specify either --cf_window OR --cf_coeff, but not both of them'
if (params.umi && !(params.read_structure1 && params.read_structure2)) exit 1, 'Please specify both --read_structure1 and --read_structure2, when using --umi'

// Has the run name been specified by the user?
Expand Down Expand Up @@ -3223,6 +3222,8 @@ if (step == 'controlfreec') mpileupOut = inputSample
mpileupOut
.choice(mpileupOutTumor, mpileupOutNormal) {statusMap[it[0], it[1]] == 0 ? 1 : 0}

(mpileupOutSingle,mpileupOutTumor) = mpileupOutTumor.into(2)

mpileupOut = mpileupOutNormal.combine(mpileupOutTumor, by:0)

mpileupOut = mpileupOut.map {
Expand All @@ -3234,8 +3235,7 @@ mpileupOut = mpileupOut.map {
// STEP CONTROLFREEC.1 - CONTROLFREEC

process ControlFREEC {
label 'cpus_max'
//label 'memory_singleCPU_2_task'
label 'cpus_8'

tag "${idSampleTumor}_vs_${idSampleNormal}"

Expand All @@ -3250,37 +3250,49 @@ process ControlFREEC {
file(dbsnpIndex) from ch_dbsnp_tbi
file(fasta) from ch_fasta
file(fastaFai) from ch_fai
file(targetBED) from ch_target_bed

output:
set idPatient, idSampleNormal, idSampleTumor, file("${idSampleTumor}.pileup_CNVs"), file("${idSampleTumor}.pileup_ratio.txt"), file("${idSampleTumor}.pileup_normal_CNVs"), file("${idSampleTumor}.pileup_normal_ratio.txt"), file("${idSampleTumor}.pileup_BAF.txt"), file("${idSampleNormal}.pileup_BAF.txt") into controlFreecViz
set idPatient, idSampleNormal, idSampleTumor, file("${idSampleTumor}.pileup_CNVs"), file("${idSampleTumor}.pileup_ratio.txt"), file("${idSampleTumor}.pileup_BAF.txt") into controlFreecViz
set file("*.pileup*"), file("${idSampleTumor}_vs_${idSampleNormal}.config.txt") into controlFreecOut

when: 'controlfreec' in tools

script:
config = "${idSampleTumor}_vs_${idSampleNormal}.config.txt"
gender = genderMap[idPatient]
// if we are using coefficientOfVariation, we must delete the window parameter
// it is "window = 20000" in the default settings, without coefficientOfVariation set,
// but we do not like it. Note, it is not written in stone
coeff_or_window = params.cf_window ? "window = ${params.cf_window}" : "coefficientOfVariation = ${params.cf_coeff}"
// Window has higher priority than coefficientOfVariation if both given
window = params.cf_window ? "window = ${params.cf_window}" : ""
coeffvar = params.cf_coeff ? "coefficientOfVariation = ${params.cf_coeff}" : ""
use_bed = params.target_bed ? "captureRegions = ${targetBED}" : ""
// This parameter makes Control-FREEC unstable (still in Beta according to the developers)
// so we disable it by setting it to its default value (it is disabled by default)
//min_subclone = params.target_bed ? "30" : "20"
min_subclone = 100
readCountThreshold = params.target_bed ? "50" : "10"
breakPointThreshold = params.target_bed ? "1.2" : "0.8"
breakPointType = params.target_bed ? "4" : "2"
mappabilitystr = params.mappability ? "gemMappabilityFile = \${PWD}/${mappability}" : ""

"""
touch ${config}
echo "[general]" >> ${config}
echo "BedGraphOutput = TRUE" >> ${config}
echo "chrFiles = \${PWD}/${chrDir.fileName}" >> ${config}
echo "chrLenFile = \${PWD}/${chrLength.fileName}" >> ${config}
echo "gemMappabilityFile = \${PWD}/${mappability}" >> ${config}
echo "${coeff_or_window}" >> ${config}
echo "contaminationAdjustment = TRUE" >> ${config}
echo "forceGCcontentNormalization = 1" >> ${config}
echo "maxThreads = ${task.cpus}" >> ${config}
echo "minimalSubclonePresence = 20" >> ${config}
echo "minimalSubclonePresence = ${min_subclone}" >> ${config}
echo "ploidy = ${params.cf_ploidy}" >> ${config}
echo "sex = ${gender}" >> ${config}
echo "readCountThreshold = ${readCountThreshold}" >> ${config}
echo "breakPointThreshold = ${breakPointThreshold}" >> ${config}
echo "breakPointType = ${breakPointType}" >> ${config}
echo "${window}" >> ${config}
echo "${coeffvar}" >> ${config}
echo "${mappabilitystr}" >> ${config}
echo "" >> ${config}

echo "[control]" >> ${config}
echo "inputFormat = pileup" >> ${config}
echo "mateFile = \${PWD}/${mpileupNormal}" >> ${config}
Expand All @@ -3295,13 +3307,95 @@ process ControlFREEC {

echo "[BAF]" >> ${config}
echo "SNPfile = ${dbsnp.fileName}" >> ${config}
echo "" >> ${config}

echo "[target]" >> ${config}
echo "${use_bed}" >> ${config}

freec -conf ${config}
"""
}

controlFreecOut.dump(tag:'ControlFREEC')

process ControlFREECSingle {
label 'cpus_8'

tag "${idSampleTumor}"

publishDir "${params.outdir}/VariantCalling/${idSampleTumor}/Control-FREEC", mode: params.publish_dir_mode

input:
set idPatient, idSampleTumor, file(mpileupTumor) from mpileupOutSingle
file(chrDir) from ch_chr_dir
file(mappability) from ch_mappability
file(chrLength) from ch_chr_length
file(dbsnp) from ch_dbsnp
file(dbsnpIndex) from ch_dbsnp_tbi
file(fasta) from ch_fasta
file(fastaFai) from ch_fai
file(targetBED) from ch_target_bed

output:
set idPatient, idSampleTumor, file("${idSampleTumor}.pileup_CNVs"), file("${idSampleTumor}.pileup_ratio.txt"), file("${idSampleTumor}.pileup_BAF.txt") into controlFreecVizSingle
set file("*.pileup*"), file("${idSampleTumor}.config.txt") into controlFreecOutSingle

when: 'controlfreec' in tools

script:
config = "${idSampleTumor}.config.txt"
gender = genderMap[idPatient]
// Window has higher priority than coefficientOfVariation if both given
window = params.cf_window ? "window = ${params.cf_window}" : ""
coeffvar = params.cf_coeff ? "coefficientOfVariation = ${params.cf_coeff}" : ""
use_bed = params.target_bed ? "captureRegions = ${targetBED}" : ""
// This parameter makes Control-FREEC unstable (still in Beta according to the developers)
// so we disable it by setting it to its default value (it is disabled by default)
//min_subclone = params.target_bed ? "30" : "20"
min_subclone = 100
readCountThreshold = params.target_bed ? "50" : "10"
breakPointThreshold = params.target_bed ? "1.2" : "0.8"
breakPointType = params.target_bed ? "4" : "2"
mappabilitystr = params.mappability ? "gemMappabilityFile = \${PWD}/${mappability}" : ""

"""
touch ${config}
echo "[general]" >> ${config}
echo "BedGraphOutput = TRUE" >> ${config}
echo "chrFiles = \${PWD}/${chrDir.fileName}" >> ${config}
echo "chrLenFile = \${PWD}/${chrLength.fileName}" >> ${config}
echo "forceGCcontentNormalization = 1" >> ${config}
echo "maxThreads = ${task.cpus}" >> ${config}
echo "minimalSubclonePresence = ${min_subclone}" >> ${config}
echo "ploidy = ${params.cf_ploidy}" >> ${config}
echo "sex = ${gender}" >> ${config}
echo "readCountThreshold = ${readCountThreshold}" >> ${config}
echo "breakPointThreshold = ${breakPointThreshold}" >> ${config}
echo "breakPointType = ${breakPointType}" >> ${config}
echo "${window}" >> ${config}
echo "${coeffvar}" >> ${config}
echo "${mappabilitystr}" >> ${config}
echo "" >> ${config}

echo "[sample]" >> ${config}
echo "inputFormat = pileup" >> ${config}
echo "mateFile = \${PWD}/${mpileupTumor}" >> ${config}
echo "mateOrientation = FR" >> ${config}
echo "" >> ${config}

echo "[BAF]" >> ${config}
echo "SNPfile = ${dbsnp.fileName}" >> ${config}
echo "" >> ${config}

echo "[target]" >> ${config}
echo "${use_bed}" >> ${config}

freec -conf ${config}
"""
}

controlFreecOutSingle.dump(tag:'ControlFREECSingle')

// STEP CONTROLFREEC.3 - VISUALIZATION

process ControlFreecViz {
Expand All @@ -3312,40 +3406,64 @@ process ControlFreecViz {
publishDir "${params.outdir}/VariantCalling/${idSampleTumor}_vs_${idSampleNormal}/Control-FREEC", mode: params.publish_dir_mode

input:
set idPatient, idSampleNormal, idSampleTumor, file(cnvTumor), file(ratioTumor), file(cnvNormal), file(ratioNormal), file(bafTumor), file(bafNormal) from controlFreecViz
set idPatient, idSampleNormal, idSampleTumor, file(cnvTumor), file(ratioTumor), file(bafTumor) from controlFreecViz

output:
set file("*.txt"), file("*.png"), file("*.bed") into controlFreecVizOut

when: 'controlfreec' in tools

script:
"""
echo "Shaping CNV files to make sure we can assess significance"
awk 'NF==9{print}' ${cnvTumor} > TUMOR.CNVs
awk 'NF==7{print}' ${cnvNormal} > NORMAL.CNVs
LINEWIDTH=`head -1 ${cnvTumor}| wc -w`; awk 'NF=='$LINEWIDTH'{print}' ${cnvTumor} > TUMOR.CNVs

echo "############### Calculating significance values for TUMOR CNVs #############"
cat /opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/assess_significance.R | R --slave --args TUMOR.CNVs ${ratioTumor}

echo "############### Calculating significance values for NORMAL CNVs ############"
cat /opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/assess_significance.R | R --slave --args NORMAL.CNVs ${ratioNormal}

echo "############### Creating graph for TUMOR ratios ###############"
cat /opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/makeGraph.R | R --slave --args 2 ${ratioTumor} ${bafTumor}

echo "############### Creating graph for NORMAL ratios ##############"
cat /opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/makeGraph.R | R --slave --args 2 ${ratioNormal} ${bafNormal}

echo "############### Creating BED files for TUMOR ##############"
perl /opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/freec2bed.pl -f ${ratioTumor} > ${idSampleTumor}.bed

echo "############### Creating BED files for NORMAL #############"
perl /opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/freec2bed.pl -f ${ratioNormal} > ${idSampleNormal}.bed
"""
}

controlFreecVizOut.dump(tag:'ControlFreecViz')

process ControlFreecVizSingle {
label 'memory_singleCPU_2_task'

tag "${idSampleTumor}"

publishDir "${params.outdir}/VariantCalling/${idSampleTumor}/Control-FREEC", mode: params.publish_dir_mode

input:
set idPatient, idSampleTumor, file(cnvTumor), file(ratioTumor), file(bafTumor) from controlFreecVizSingle

output:
set file("*.txt"), file("*.png"), file("*.bed") into controlFreecVizOutSingle

when: 'controlfreec' in tools

script:
"""
echo "Shaping CNV files to make sure we can assess significance"
LINEWIDTH=`head -1 ${cnvTumor}| wc -w`; awk 'NF=='$LINEWIDTH'{print}' ${cnvTumor} > TUMOR.CNVs

echo "############### Calculating significance values for TUMOR CNVs #############"
cat /opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/assess_significance.R | R --slave --args TUMOR.CNVs ${ratioTumor}

echo "############### Creating graph for TUMOR ratios ###############"
cat /opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/makeGraph.R | R --slave --args 2 ${ratioTumor} ${bafTumor}

echo "############### Creating BED files for TUMOR ##############"
perl /opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/freec2bed.pl -f ${ratioTumor} > ${idSampleTumor}.bed
"""
}

controlFreecVizOutSingle.dump(tag:'ControlFreecVizSingle')

// Remapping channels for QC and annotation

(vcfStrelkaIndels, vcfStrelkaSNVS) = vcfStrelka.into(2)
Expand Down
4 changes: 2 additions & 2 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ params {
// Variant Calling
ascat_ploidy = null // Use default value
ascat_purity = null // Use default value
cf_coeff = "0.015" // default value for Control-FREEC
cf_coeff = "0.05" // default value for Control-FREEC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno, what is the default value, and why exactly are we changing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are changing it to the default value as recommend in their manual (http://boevalab.inf.ethz.ch/FREEC/tutorial.html). Was there any specific reason to use a value different than the default one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, our colleagues played with that, and were happier with 0.05. Will ask why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see :) I personally believe that is nice to have as default the same as in Control-FREEC specially given that this parameter can be changed by the user. The optimal value is probably different for every case but of course I may be wrong here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I check the Control-FREEC source, this value is used only at one place: https://github.com/BoevaLab/FREEC/blob/master/src/GenomeCopyNumber.cpp#L60 and the sole purpose is to calculate the windowSize, if it is not provided in the config. For the clarity, it is actually genome_size/(read_number*cf_coeff^2) and my guess the readlength should be considered somewhere too, but whatever. Smaller windowSize should mean higher resolution but more noise. We can use the default, sure, and change the value in our own config. I can run some tests on our data, to be safe before release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, the recommended setting for WES is to use cv_window = 0. I added an entry about that in the docs. It may be a good idea to mention this for the cv_coef parameter.

cf_ploidy = "2" // you can use 2,3,4
cf_window = "" // by default we are not using this in Control-FREEC
cf_window = null // by default we are not using this in Control-FREEC
no_gvcf = null // g.vcf are produced by HaplotypeCaller
no_strelka_bp = null // Strelka will use Manta candidateSmallIndels if available
pon = false // No default PON (Panel of Normals) file for GATK Mutect2 / Sentieon TNscope
Expand Down
10 changes: 5 additions & 5 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -263,18 +263,18 @@
},
"cf_coeff": {
"type": "number",
"default": 0.015,
"default": 0.05,
"fa_icon": "fas fa-wrench",
"description": "Overwrite Control-FREEC coefficientOfVariation"
},
"cf_ploidy": {
"type": "integer",
"default": 2,
"type": "string",
"default": "2",
"fa_icon": "fas fa-wrench",
"description": "Overwrite Control-FREEC ploidy"
},
"cf_window": {
"type": "string",
"type": "number",
"fa_icon": "fas fa-wrench",
"description": "Overwrite Control-FREEC window size"
},
Expand Down Expand Up @@ -747,4 +747,4 @@
"$ref": "#/definitions/institutional_config_options"
}
]
}
}