- Downlaod Decomposed HPRC from https://zenodo.org/records/6797328
wget https://zenodo.org/records/6797328/files/cactus_filtered_ids.vcf.gz
-
Download and Build TheGreatGenotyper, beagle, bcftools and AnnotSV.
-
Split the pangenome into 10 slices for faster computation. The following script splits variants from each chromosome into 10 chunks and forms a slice using a chunk from each chromosome.
./slice_pangenome.sh cactus_filtered_ids.vcf.gz 10 sliced_pangenome_10/
- Edit config.yaml to configure input and output, as well as programs.
Field | Description |
---|---|
INPUT_DIR | Input folder containing the sliced pangenome |
TEMP_FOLDER | Folder to store temporary files. |
INPUT_REFERENCE | Input genome reference |
INPUT_INDEX | Txt file containing a list of CCDG indexes. |
BEAGLE | Binary Path for Beagle |
BEAGLE_MAP | Beagle MAP file. Download here(maps) |
TheGreatGenotyper | Binary Path for The Great Genotyper |
AnnotSV | Binary Path for AnnotSV |
bcftools | Binary path for Bcftools |
OUTPUT_dir | Output Directory |
- Run the workflow
snakemake --configfile config.yaml -np
the final output can be downloaded from this link