Skip to content

dbmi-bgm/cgap-pipeline-validation

Repository files navigation

cgap-pipeline-validation

This repository is used to store scripts for validation of CGAP pipeline using UGRP data.

Scripts to generate metadata used to upload families into the CGAP portal:

  • trio_output.py takes in an input file and family ID number and will generate a set of trios for all possible parent-offspring combination in the family (including parent-offspring trio and grandparent-parent trios). The python script will output a set of trios excel files for each offspring and their parents.

  • accession_pedigree_file_generation.py takes in an input file, family ID number, proband ID and will generate an accessioning file and pedigree file for the selected family. The accessioning file contains information including the unique_analysis_id, family_id, analysis_id, file locations and so on for the family while the pedigree contains information that is related to the characteristics of each individual within the family including hpo_terms, mondo_terms and so on.

Most of the statistics and metrics are generated by het_stat.sh and denovo_stat.sh based on granite (https://github.com/dbmi-bgm/granite) and bcftools.

Scripts to parse and format the results:

  • granite_het_automation.py will parse and format to excel the results from het_stat.sh.
  • granite_denovo_automation.py will parse and format to excel the results from denovo_stat.sh.

Example Code:

  python3 granite_het_automation.py \
    # output of het_stat.sh to process
    GAPFII5YI4XJ.het.indel.json \
    # excel output name
    family_22_indel
  python3 granite_denovo_automation.py \
    # output of denovo_stat.sh to process
    GAPETMPGNUOI.novo.indel.ison \
    # csv file matching individual and corresponding file names
    family_22_children.csv \
    # excel output name
    family_22_children_denovo \
    # y if children file is being read, n for parents
    y

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published