Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ControlFREEC work for WES and WGS data in both paired and tumor-only modes #302

Merged
merged 28 commits into from
Dec 18, 2020

Conversation

jfnavarro
Copy link
Contributor

@jfnavarro jfnavarro commented Nov 7, 2020

nf-core/sarek pull request

Control-FREEC has been upgraded and the configuration of its process has been optimized and updated to work with WES data as well as WGS data. Few bugs have been fixed and improvements have been made in the code and docs. Tumor-only mode for Control-FREEC has been added too.

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Learn more about contributing: CONTRIBUTING.md

@jfnavarro jfnavarro requested a review from maxulysse as a code owner November 7, 2020 15:00
@maxulysse maxulysse mentioned this pull request Nov 17, 2020
8 tasks
@maxulysse
Copy link
Member

@jfnavarro with the help of @ewels we finally fixed #305 issue
So instead of removing the mappability option from Control-FREEC, would it be possible to make it as an optional parameter when the file is not provided?

@jfnavarro
Copy link
Contributor Author

@jfnavarro with the help of @ewels we finally fixed #305 issue
So instead of removing the mappability option from Control-FREEC, would it be possible to make it as an optional parameter when the file is not provided?

Yes, of course.

@@ -63,7 +63,6 @@ params {
intervals = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/intervals/GRCm38_calling_list.bed"
known_indels = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz"
known_indels_index = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz.tbi"
mappability = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Control-FREEC/GRCm38_68_mm10.gem"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should not be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uppps :) Fixed!

@@ -38,7 +38,7 @@ def helpMessage() {
-profile [str] Configuration profile to use. Can use multiple (comma separated)
Available: conda, docker, singularity, test, awsbatch, <institute> and more
--step [list] Specify starting step (only one)
Available: mapping, prepare_recalibration, recalibrate, variant_calling, annotate, Control-FREEC
Available: mapping, prepare_recalibration, recalibrate, variant_calling, annotate, ControlFREEC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Available: mapping, prepare_recalibration, recalibrate, variant_calling, annotate, ControlFREEC
Available: mapping, prepare_recalibration, recalibrate, variant_calling, annotate, Control-FREEC

Actually as for tools, the string is stripped out of _ and - and lowercased, so Control-FREEC should actually work.
I would prefer that spelling as it's the one from the tool itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
@szilvajuhos any comments?

Copy link
Contributor

@szilvajuhos szilvajuhos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, though we have changed the coefficientOfVariation for a reason I guess. It needs testing anyway.

@@ -49,9 +49,9 @@ params {
// Variant Calling
ascat_ploidy = null // Use default value
ascat_purity = null // Use default value
cf_coeff = "0.015" // default value for Control-FREEC
cf_coeff = "0.05" // default value for Control-FREEC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno, what is the default value, and why exactly are we changing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are changing it to the default value as recommend in their manual (http://boevalab.inf.ethz.ch/FREEC/tutorial.html). Was there any specific reason to use a value different than the default one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, our colleagues played with that, and were happier with 0.05. Will ask why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see :) I personally believe that is nice to have as default the same as in Control-FREEC specially given that this parameter can be changed by the user. The optimal value is probably different for every case but of course I may be wrong here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I check the Control-FREEC source, this value is used only at one place: https://github.com/BoevaLab/FREEC/blob/master/src/GenomeCopyNumber.cpp#L60 and the sole purpose is to calculate the windowSize, if it is not provided in the config. For the clarity, it is actually genome_size/(read_number*cf_coeff^2) and my guess the readlength should be considered somewhere too, but whatever. Smaller windowSize should mean higher resolution but more noise. We can use the default, sure, and change the value in our own config. I can run some tests on our data, to be safe before release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, the recommended setting for WES is to use cv_window = 0. I added an entry about that in the docs. It may be a good idea to mention this for the cv_coef parameter.

@jfnavarro jfnavarro changed the title Remove mappability option from ControlFREEC Make ControlFREEC work for WES and WGS data in both paired and tumor-only modes Dec 10, 2020
@maxulysse
Copy link
Member

@szilvajuhos @jfnavarro any need for more modifications? or are we good to go for a merge?

@szilvajuhos
Copy link
Contributor

Hej, I just do not know, for T/N samples it is not working for me with 11.6. Whatever, can we change https://github.com/jfnavarro/sarek/blob/dev/main.nf#L3421 and https://github.com/jfnavarro/sarek/blob/dev/main.nf#L3454 to something like

LINEWIDTH=`head -1 ${cnvTumor}| wc -w`; awk 'NF=='$LINEWIDTH'{print}'  ${cnvTumor} > TUMOR.CNVs

so it should work for any fileformat hopefully.

Copy link
Contributor

@szilvajuhos szilvajuhos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

espléndido

@maxulysse maxulysse merged commit e94d595 into nf-core:dev Dec 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants