You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using DWGSIM version 0.1.14.
My goal is to simulate Insertion ranging from length 10 to 200, and being able to add specific sequences of insertion
First test simulation with VCF
In my first try, I used a VCF with -v option and an insertion length of 198, the simulation gave me the correct location for my insertion after aligning the reads with BWA and called the variant with variant callers.
I only put one insertion in my VCF.
Second test of Simulation with VCF
My next try was to used different insertions length with the same workflow tested with my first try. I unfortunately noticed that none of the new insertions were correctly simulated.
The genomic coordinates were shifted upward to the location originally present in the VCF, and the inserted bases got modified compared to my inputted ones.
the recaptured inserted sequences were not the one originally put in the VCF.
### Test simulation with BED
Based on previous issues described in this repo, I decided to give a shot to the BED file as well to simulate my insertions.
After trial and errors in making the BED file, I got the message: Error: insertion of length 36 exceeded the maximum supported length of 26
but when using the command dwgsim --help, there is a note that mentions that the longest insertion supported is 4294967295, I cite: Note: The longest supported insertion is 4294967295.
I tried with an insertion of 25 and it worked as expected (I shrank my original insertion to a length of 25 (26 = 1 + 25; i.e., the REF plus all the new 25 inserted bases)).
case 1
if the bed file has the following line, chr13 28034117 28034143 GATCATATTCATATTCTCTGAAATCA INSERT
G is the REF at position 28034118, here the first G, and the inserted bases I want in are ATCATATTCATATTCTCTGAAATCA, i.e. 25 bases. I get this output in the dwgsim's mutations.vcf file: chr13 28034117 . A AAATCATATTCATATTCTCTGAAATC 100 PASS AF=1.0;pl=3;mt=INSERT
case 2
If I put the following line in the bed file, chr13 28034118 28034144 GATCATATTCATATTCTCTGAAATCA INSERT
I get this output in the dwgsim's mutations.vcf file: chr13 28034119 . A AGATCATATTCATATTCTCTGAAATCA 100 PASS AF=1.0;pl=3;mt=INSERT
case 3
If I put the following line in the bed file chr13 28034118 **28034143** ATCATATTCATATTCTCTGAAATCA INSERT
I get this output in the dwgsim's mutations.vcf file:
chr13 28034119 . A AATCATATTCATATTCTCTGAAATCA 100 PASS AF=1.0;pl=3;mt=INSERT
Knowing that BED file is 0-based, the case 1 above should have been the correct case to use to get the insertion being inserted right after the G on position 28034118.
Impact of the SEED value on the AF and pl values in the dwgsim's mutations.vcf file
I have been using the -F to control the allele frequency specifically;
With using a BED file and -b option, I noticed that the values of AF were not the 0.5 all the time, but that value changes when using a different seed value; I woulf have expected not to be modified since I stiplated on the command line that I wanted AF=0.5 specifically using the -F option; Unless I am mistaken on the correlation between the two values of -F and AF in the mutations output file.
Questions:
Is there anything wrong with the way the INSERT line in the BED file is constructed?
Is there any option that could make dwgsim working with a BED file and with insertions longer than 26?
Is there any option that should be adjusted to make it work with a VCF using -v?
Is AF only fixed and not dependent on the seed value when using the -v option and not the -b ?
Is the longest insertion of 4 billion bases only supported when using a VCF? otherwise what is the max length of insertion supported when using a VCF?
Could you let me know what would be the best way (i.e. dwgsim command) to simulate exactly these insertions? Same exact coordinate and same exact sequence as the one in the provided BED or VCF. Thanks.
I am providing an example of one of my VCF with an insertion of 81 bases.
The Reference genome was only the complete sequence of the chromosome 13.
I’ll be honest that I have very little time for open source unpaid support given my work and family commitments. My apologies in advance, and feel free to reach out to www.fulcrumgenomics.com if you’d like paid support, though we are also booked out pretty far. I’ll keep this open so I can get to it if I am able to come up for air.
Hello Nils,
I have been using DWGSIM version 0.1.14.
My goal is to simulate Insertion ranging from length 10 to 200, and being able to add specific sequences of insertion
First test simulation with VCF
In my first try, I used a VCF with
-v
option and an insertion length of 198, the simulation gave me the correct location for my insertion after aligning the reads with BWA and called the variant with variant callers.I only put one insertion in my VCF.
Second test of Simulation with VCF
My next try was to used different insertions length with the same workflow tested with my first try. I unfortunately noticed that none of the new insertions were correctly simulated.
VCF
, and the inserted bases got modified compared to my inputted ones.VCF
.### Test simulation with BED
Based on previous issues described in this repo, I decided to give a shot to the BED file as well to simulate my insertions.
After trial and errors in making the
BED
file, I got the message:Error: insertion of length 36 exceeded the maximum supported length of 26
but when using the command
dwgsim --help
, there is a note that mentions that the longest insertion supported is 4294967295, I cite:Note: The longest supported insertion is 4294967295.
I tried with an insertion of 25 and it worked as expected (I shrank my original insertion to a length of 25 (26 = 1 + 25; i.e., the REF plus all the new 25 inserted bases)).
case 1
if the bed file has the following line,
chr13 28034117 28034143 GATCATATTCATATTCTCTGAAATCA INSERT
G is the REF at position 28034118, here the first G, and the inserted bases I want in are
ATCATATTCATATTCTCTGAAATCA
, i.e. 25 bases. I get this output in the dwgsim's mutations.vcf file:chr13 28034117 . A AAATCATATTCATATTCTCTGAAATC 100 PASS AF=1.0;pl=3;mt=INSERT
case 2
If I put the following line in the bed file,
chr13 28034118 28034144 GATCATATTCATATTCTCTGAAATCA INSERT
I get this output in the dwgsim's mutations.vcf file:
chr13 28034119 . A AGATCATATTCATATTCTCTGAAATCA 100 PASS AF=1.0;pl=3;mt=INSERT
case 3
If I put the following line in the bed file
chr13 28034118 **28034143** ATCATATTCATATTCTCTGAAATCA INSERT
I get this output in the dwgsim's mutations.vcf file:
chr13 28034119 . A AATCATATTCATATTCTCTGAAATCA 100 PASS AF=1.0;pl=3;mt=INSERT
Knowing that BED file is 0-based, the case 1 above should have been the correct case to use to get the insertion being inserted right after the G on position 28034118.
Impact of the
SEED
value on theAF
andpl
values in thedwgsim
'smutations.vcf
fileI have been using the
-F
to control the allele frequency specifically;With using a BED file and
-b
option, I noticed that the values ofAF
were not the 0.5 all the time, but that value changes when using a different seed value; I woulf have expected not to be modified since I stiplated on the command line that I wanted AF=0.5 specifically using the-F
option; Unless I am mistaken on the correlation between the two values of-F
andAF
in the mutations output file.Questions:
dwgsim
working with a BED file and with insertions longer than 26?-v
?AF
only fixed and not dependent on the seed value when using the-v
option and not the-b
?dwgsim
command) to simulate exactly these insertions? Same exact coordinate and same exact sequence as the one in the provided BED or VCF. Thanks.I am providing an example of one of my VCF with an insertion of 81 bases.
The Reference genome was only the complete sequence of the chromosome 13.
Example of dwgsim command
Let me know what further information you may need.
Thanks for your feedback,
Chris.
inputs_insertion.zip
The text was updated successfully, but these errors were encountered: