Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GATK small_exac_common_3.hg38.vcf.gz as default germline_resource #959

Open
ameynert opened this issue Mar 3, 2023 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@ameynert
Copy link
Contributor

ameynert commented Mar 3, 2023

Description of feature

See broadinstitute/gatk#7606 and #592.

In the GATK4 GetPileupSummaries code, the entire -V option populated by the sarek germline_resource parameter is read into memory. The current default for human hg38 is to use gnomad_af_only_hg38, which is huge and leads to Java heap out of memory errors. The request is to use the GATK file small_exac_common_3 file for this purpose instead. It's a subset of common variants found in gnomAD (https://gatk.broadinstitute.org/hc/en-us/community/posts/360067310872-How-to-find-or-generate-common-germline-variant-sites-VCF-required-by-GetPileupSummaries).

@ameynert ameynert added the enhancement New feature or request label Mar 3, 2023
@FriederikeHanssen
Copy link
Contributor

Hey! Just to clarify, do you want to use different germline_resource files for mutect and getpileupsummaries respectively? Otherwise it would be as simple as adding the file to igenomes and updating the germline_resource path.
Just trying to understand how much work is needed here :D

@ameynert
Copy link
Contributor Author

ameynert commented Mar 3, 2023

GetPileupSummaries: https://gatk.broadinstitute.org/hc/en-us/articles/9570416554907-GetPileupSummaries

The tool requires a common germline variant sites VCF, e.g. derived from the gnomAD resource, with population allele frequencies (AF) in the INFO field. This resource must contain only biallelic SNPs and can be an eight-column sites-only VCF.

Mutect2: https://gatk.broadinstitute.org/hc/en-us/articles/9570422171291-Mutect2

--germline-resource
Population vcf of germline sequencing containing allele fractions.
A resource, such as gnomAD, containing population allele frequencies of common and rare variants.

The above descriptions indicate they are for different purposes, so they should be different files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants