-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathCHANGES
284 lines (249 loc) · 14.5 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
Revision history for Grinder
0.5.3 30-May-2013
Completed fix for bug #6, multiplexed read close to length of reference
(reported by Ali May).
When generating multiple libraries, default is now to use 100% permuted
to have dissimilar communities (consistent with 0% shared as default).
0.5.2 26-Apr-2013
Fixed bug causing reads too short when using MIDs and asking for a read
length close to that of their reference (bug #6, reported by Ali May).
0.5.1 19-Apr-2013
Fixed bug preventing the insertion of very low frequency sequencing
errors (bug #5).
Updated average_genome_size script to use percentage in Grinder rank
file instead of fractional numbers.
0.5.0 14-Jan-2013
Removed the =encoding statement which was breaking Pod::PlainText
(reported by Lauren Bragg)
Precompile <exclude_chars> regular expression
0.4.9 20-Nov-2012
Significant speedup by using improved version of Bioperl modules
(reported by Ben Woodcroft).
Fixed bug in RF and FR -oriented mates produced from the reverse-
complement of the reference sequence (reported by Mike Imelfort).
Mate orientation documented for IonTorrent (reported by Mike Imelfort).
The relative abundances reported by Grinder in the rank file are now
expressed as percentage instead of fractional for consistency.
Updated dependencies to satisfy older Perl (reported by Stephen Turner).
Build the documentation on author-side, not user side (reported by
Stephen Turner).
0.4.8 10-Oct-2012
Fixed bug when making amplicon reads using specified relative abundances
based on genomes with multiple amplicons (reported by Bertrand Bonnaud).
Usage message improvements (reported by Xiao Yang).
Delegated some operations to dedicated modules.
0.4.7 27-May-2012
Requiring Math::Random::MT version 1.14 should fix issues that Windows
users are having (reported by David Koslicki).
0.4.6 27-May-2012
When generating kmer-based chimeras, save resources by only calculating
the kmers of the reference sequences that are going to be used
(improvement suggested by David Koslicki).
Fixed an "undefined value" error when using kmer-based chimeras
(reported by David Koslicki).
Fixed an error when using kmer-based chimeras but not using all the
reference sequences (reported by David Koslicki).
0.4.5 27-Jan-2012
Fixed bug when adding mutations linearly to a 1 bp read (reported by
Robert Schmieder).
Better handling of 0 bp reference sequences.
Fixed bug when looking for amplicons on the reverse complement of a
reference sequence.
Properly remove the shortest of two amplicons, even if they are on
different strands.
0.4.4 20-Jan-2012
Dependencies update: no need for Math::Random::MT::Perl anymore.
0.4.3 18-Jan-2012
Implemented multimeras, i.e. chimeras from more than two reference
sequences (suggested by anonymous reviewer). See <chimera_dist>.
Implemented chimeras where the breakpoints correspond to k-mers shared
by the reference sequences (suggested by anonymous reviewer). See
<chimera_kmer>.
0.4.2 15-Dec-2011
Fixed incorrectly calculated relative abundances when using length bias
(reported by Mike Imelfort and Mohamed Fauzi Haroon).
0.4.1 25-Nov-2011
The keyword 'strand' is not used anymore in the description of reads.
Read coordinates are now reported like in the Genbank format:
"position=complement(1..20)" instead of "position=1-20 strand=-1"
Fixed bug reported by Dana Willner: when looking for full-length amplicon
matches based on PCR primers, matches are now sought in the reference
sequences but also in their reverse-complement
Better handling of discrepancies between the number of libraries specified
with the num_libraries option and in the abundance_file (reported by
Dana Willner).
0.4.0 04-Nov-2011
Support for DNA, RNA and proteic reference sequences to produce genomic
metagenomic, transcriptomic, metatranscriptomic, proteomic and
metaproteomic datasets
New error model suitable to simulate Illumina reads: 4th degree polynome
Change in error model (mutation_distribution) parameter:
- general syntax is now model_name, model_parameters...
- the first parameter for the linear model is now the error rate at the
3' end of the reads, not the average error rate
Speed improvement for position-specific error models
Galaxy GUI fix so that the output is fastqsanger, not just fastq
The reference_file parameter is now a required argument, so that running
grinder without arguments displays the help (reported by Robert Schmieder)
Fixed a bug that caused a crash when using an indel model and a homopolymer
model simultaneously (reported by Robert Schmieder)
Information displayed on screen now reports whether the library is a
shotgun or amplicon library
0.3.9 18-Oct-2011
New option <mate_orientation> to select orientation of mate pairs
New default for mate orientation: forward-reverse instead of forward-forward
Handle empty reference sequence description more gracefully
Galaxy GUI compatible with workflows and new tool shed
0.3.8 04-Oct-2011
Graphical interface for the Galaxy project
Support for writing the output reads in FASTQ format (Sanger variant)
Support for nested and overlapping amplicons
Tests do not fail if the optional dependency Statistics::R is not installed
Tested that Grinder works 100% on Windows
Generating 100 reads by default instead of coverage 0.1x
Fixed bug where read description was not created if unidirectional was set to -1
0.3.7 13-Sep-2011
Fixed bug in richter and margulies homopolymer error models
Fixed bug so that output rank file now collapses amplicon by species
The Grinder CLI script is now called 'grinder' (all lowercase)
Option mutation_ratio has changed so that it is possible to specify indels without substitutions
Location of amplicon relative to the reference sequence is now recorded
in the read description using the 'amplicon' field
Better reporting of chimeras in read descriptions using a comma-separated
list for the 'amplicon' and 'reference' field
Redundant sequencing errors (multiple errors at the same position) are
now tracked in read descriptions
New dependency: using Math::Random::MT Perl module for added speed
Improved build and test mechanics
Added tests for chimeras, indels, substitutions and homopolymers
More comprehensive tests for seeding and random number generation
0.3.6 03-Aug-2011
Support for reference sequences that contain several amplicons
Implemented a gene copy bias option for amplicon libraries
Primers can now match RNA sequences or ambiguous residues of the reference
sequence
Automatic community structure parameter value picking when none is provided
Fixed uniform insert and read length distribution
Fixed quality scores, which were generated but never written to disk
Write on screen when QUAL files are generated
Added links to example databases that users can use as Grinder input
Specified the URL where to report bugs
More unit tests: community structure, read and insert length distributions
amplicons with specified genome abundance
0.3.5 21-Jul-2011
Implemented a profile mechanisms to store user's preferred options
Added a script to reverse the orientation of right-hand mates
Fixed issues with reads with MIDs (in Bio::Seq::SimulatedRead)
Library number in ID of first sequence in libraries with even number was
wrong when mate pair was used
Number of the pair in mate pair IDs was wrong
Grinder development put under Git versioning control on SourceForge
More unit tests
Versioning fix
0.3.4 23-Jun-2011
New option to generate basic quality scores if desired (-qual_levels)
New option to not track the read info in the read description (-desc_track)
Objects returned by Grinder are now Bio::Seq::SimulatedRead Bioperl objects
Double-quotes in read description are now escaped, i.e. '"' becomes '\"'
Now using 'reference' instead of 'source' in read tracking description
Changes in the defaults:
uniform community structure instead of power law
uniform read distribution instead of normally distributed
0.3.3 03-Mar-2011
New option to sequence from the reverse strand: see <unidirectional>
(suggested by Barry Cayford).
Output FASTA files now named *reads* instead of *shotgun* because
libraries can be amplicon too.
Output file names now use numbers padded with zeroes so that, e.g. if
123 libraries were requested, their name is in 001, 002 ... 123.
Output folder is now created automatically if it does not already exist.
The next_read() method now returns only one read, even for mate pairs.
Force the alphabet to DNA when reading the primer sequence file since
degenerate primers can look like protein sequences.
Fixed bug where Grinder sometimes created libraries even though there
were not enough sequences to do it safely (reported by Dana Willner).
When the number of reads to generate is smaller than the required
diversity, the actual diversity reported reflects this now.
Not reporting errors "Not enough sequences for chimera..." when there is
less than 2 reads and chimera_perc is 0.
Fixed bug in argument processing by Getopt::Euclid that affected
repeated calls to the new() method.
Fixed calculation of number of genomes shared. Clearly specified in the
documentation that the percent shared is relative to the diversity of
the least abundant library (reported by Dana Willner).
Fixed calculation of the total library diversity.
Many more Grinder test cases.
0.3.2 11-Feb-2011
New feature to specify specific characters to delete (N, -, ...) (suggested by Mike Imelfort)
New method to retrieve the seed number used for the computation: $factory->get_random_seed
When excluding specific characters, an amplicon read is attempted only once now
More robust parsing of abundance file
It is now a fatal error if sequences requested in an abundance file are
not found in the genome file
Small optimizations
0.3.1 08-Feb-2011
Support for making multiple libraries with different richness (diversity) values
Fixed bug for communities with specified relative abundances (reported by Mike Imelfort)
Better error messages for sequences that have a specified abundance
0.3.0 12-Jan-2011
Command-line arguments have changed; all have a short and long version
Grinder API to allow to run Grinder inside Perl programs
Support for amplicon sequencing
For amplicon simulation, a forward and optional reverse primer (in IUPAC) can be specified
Amplicon can be given multiplex identifiers (MIDs)
Support for a generating chimeras
Homopolymer error simulation
More error models for point mutations (uniform and linear)
Read error tracking in the sequence description
New default is to produce reads with no errors
New FASTA read description that specifies its source, position, strand, description and errors
Option to take shotgun reads from reverse complement
Support for specifying the structure of several communities manually
Speed improvements
0.2.0 22-Sep-2010
New options available when generating multiple shotgun libraries. Alpha
and beta diversity can be specified:
* richness
* percentage of genomes shared between libraries
* percentage of the top genomes with a different abundance rank
Revised way that mate pair reads are named. Example:
>1000/1 seq3|31-60
>1000/2 seq3|41-70
Added utility to calculate average genome length from Grinder rank file
0.1.9 24-Jun-2010
Thanks to Ramsi Temanni for his suggestions and feedback regarding forbidden characters.
Support for characters forbidden in the shotgun reads
Little bugfix regarding default values for arguments that take a list of values
0.1.8 22-Apr-2010
Thanks to Albert Villela for his suggestions and feedback regarding paired reads.
Changes in command-line options to accomodate new features
Support for inputting a file specifying the abundance of the different genomes
Support for mate pairs / paired end reads
Support for uniform or normal distribution of read lengths and mate pair
insert lengths
Fixed bug causing an error when the number of reads in the input file
cannot be divided by the number of independent libraries required
Changed output sequence ID to a more consistent scheme
0.1.7 15-Feb-2010
Not keeping the sequences in memory anymore to preserve resources
Really using the Math::Random::MT::Perl seeding facility
0.1.6 07-Dec-2009
Now using the Math::Random::MT::Perl seeding facility
0.1.5 24-Feb-2009
Grinder now has a proper installer (Perl module style)
0.1.4
Added basic report on libraries produced
Fixed bug in number of sequences created when using independent libraries
0.1.3
Ability to generate several random shotgun libraries at once that do not
contain any genome in common
0.1.2
Correction in the code to generate mutations
Changed the defaults to use a powerlaw model and the size-dependent option
The main module function now returns a hashref of rank-abundances
0.1.1
Introduction of the simulation of sequencing errors (substitutions and indels)
Modified the way the random number generation is handled
The main module function now returns an arrayref of Bio::Seq objects
0.1.0
Initial release