Skip to content

Commit

Permalink
Merge pull request #512 from drpatelh/dsl2
Browse files Browse the repository at this point in the history
Use STAR to build initial index for RSEM - Fixes #511
  • Loading branch information
drpatelh authored Dec 1, 2020
2 parents 9b54e51 + 641dfbc commit 5288244
Show file tree
Hide file tree
Showing 9 changed files with 62 additions and 24 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

* Updated pipeline template to nf-core/tools `1.12`
* [[#500](https://github.com/nf-core/rnaseq/issues/500), [#509](https://github.com/nf-core/rnaseq/issues/509)] - Error with AWS batch params
* [[#511](https://github.com/nf-core/rnaseq/issues/511)] - rsem/star index fails with large genome

## [[2.0](https://github.com/nf-core/rnaseq/releases/tag/2.0)] - 2020-11-12

Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ On release, automated continuous integration tests run the pipeline on a [full-s
-profile <docker/singularity/podman/conda/institute>
```

> **NB:** The commands to obtain public data and to run the main arm of the pipeline are completely independent. This is intentional because it allows you to download all of the raw data in an initial pipeline run (`results/public_data/`) and then to curate the auto-created samplesheet based on the available sample metadata before you run the pipeline again properly.

See [usage docs](https://nf-co.re/rnaseq/usage) for all of the available options when running the pipeline.

## Documentation
Expand Down
10 changes: 5 additions & 5 deletions bin/deseq2_qc.r
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,11 @@ if (is.null(opt$count_file)){
################################################
################################################

count.table <- read.delim(file=opt$count_file,header=TRUE)
rownames(count.table) <- count.table[,opt$id_col]
count.table <- count.table[,opt$count_col:ncol(count.table),drop=FALSE]
colnames(count.table) <- gsub(opt$sample_suffix,"",colnames(count.table))
colnames(count.table) <- as.character(lapply(colnames(count.table), function (x) tail(strsplit(x,'.',fixed=TRUE)[[1]],1)))
count.table <- read.delim(file=opt$count_file,header=TRUE)
rownames(count.table) <- count.table[,opt$id_col]
count.table <- count.table[,opt$count_col:ncol(count.table),drop=FALSE]
colnames(count.table) <- gsub(opt$sample_suffix,"",colnames(count.table))
colnames(count.table) <- gsub(pattern='\\.$', replacement='', colnames(count.table))

################################################
################################################
Expand Down
2 changes: 1 addition & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ params {
publish_dir = "${params.aligner}/bigwig"
}
'deseq2_qc' {
args = "--id_col 1 --sample_suffix '' --outdir ./ --outprefix deseq2"
args = "--id_col 1 --sample_suffix '' --outprefix deseq2"
publish_files = ['RData':'', 'pca.vals.txt':'', 'plots.pdf':'', 'sample.dists.txt':'', 'size_factors':'', 'log':'']
publish_dir = "${params.aligner}/deseq2_qc"
}
Expand Down
11 changes: 10 additions & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
*/

params {
config_profile_name = 'Test profile'
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Limit resources so that this can run CI
Expand All @@ -34,4 +34,13 @@ params {
// Other parameters
pseudo_aligner = 'salmon'
umitools_bc_pattern = 'NNNN'

// When using RSEM, remove warning from STAR whilst building tiny indices
modules {
'rsem_preparereference' {
args2 = "--genomeSAindexNbases 7"
}
}
}


8 changes: 4 additions & 4 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@
*/

params {
config_profile_name = 'Full test profile'
config_profile_name = 'Full test profile'
config_profile_description = 'Full test dataset to check pipeline function'

// Input data for full size test
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/samplesheet_full.csv'
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/samplesheet_full.csv'

// Genome references
genome = 'GRCh37'
// Genome reference
genome = 'GRCh37'

// Other parameters
pseudo_aligner = 'salmon'
Expand Down
2 changes: 1 addition & 1 deletion conf/test_sra.config
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
*/

params {
config_profile_name = 'Public data download test profile'
config_profile_name = 'Public data download test profile'
config_profile_description = 'Minimal test dataset to check pipeline function when downloading data via the ENA'

// Limit resources so that this can run CI
Expand Down
46 changes: 36 additions & 10 deletions modules/nf-core/software/rsem/preparereference/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,41 @@ process RSEM_PREPAREREFERENCE {

script:
def software = getSoftwareName(task.process)
"""
mkdir rsem
rsem-prepare-reference \\
--gtf $gtf \\
--num-threads $task.cpus \\
$options.args \\
$fasta \\
rsem/genome
def args = options.args.tokenize()
if (args.contains('--star')) {
args.removeIf { it.contains('--star') }
def memory = task.memory ? "--limitGenomeGenerateRAM ${task.memory.toBytes() - 100000000}" : ''
"""
mkdir rsem
STAR \\
--runMode genomeGenerate \\
--genomeDir rsem/ \\
--genomeFastaFiles $fasta \\
--sjdbGTFfile $gtf \\
--runThreadN $task.cpus \\
$memory \\
$options.args2
rsem-prepare-reference \\
--gtf $gtf \\
--num-threads $task.cpus \\
${args.join(' ')} \\
$fasta \\
rsem/genome
rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g" > ${software}.version.txt
"""
rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g" > ${software}.version.txt
"""
} else {
"""
mkdir rsem
rsem-prepare-reference \\
--gtf $gtf \\
--num-threads $task.cpus \\
$options.args \\
$fasta \\
rsem/genome
rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g" > ${software}.version.txt
"""
}
}
4 changes: 2 additions & 2 deletions rnaseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -299,9 +299,9 @@ workflow RNASEQ {
ch_input
)
.map {
meta, bam ->
meta, fastq ->
meta.id = meta.id.split('_')[0..-2].join('_')
[ meta, bam ] }
[ meta, fastq ] }
.groupTuple(by: [0])
.map { it -> [ it[0], it[1].flatten() ] }
.set { ch_cat_fastq }
Expand Down

0 comments on commit 5288244

Please sign in to comment.