-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-process transcripts fasta when using --gencode #864
Comments
Hi @andreas-wilm ! Thanks for reporting. I have personally never used a GENCODE reference. Do you have an idea as to what we should change in the pipeline to deal with this error? We should only be using the Lines 89 to 90 in 89bf536
|
Did you try posting this on salmon GitHub - https://github.com/COMBINE-lab/salmon/issues/ Probably @rob-p might have an answer? |
Probably what @rob-p mentions in the issue linked by @andreas-wilm could be tried?
|
Hi @drpatelh and @tamuanand, thanks for looking into this. Removing everything after the pipe from the gencode.v41.transcripts.fa file (with a simple Please note: I did this after the run failed, on the staged files and restarted with |
Hey I can confirm the above done by @andreas-wilm works when doing the As a side note, which was important for me, I was also able to import the The issue I am having now which is probably for https://github.com/Bioconductor/GenomicFeatures and/or https://github.com/mikelove/tximeta is after inputting Regarding the issue of the OP, though, below is what I did which worked (Now, I guess, what to change, and where, in
The following worked (confirming @andreas-wilm above #864 (comment)):Print working directory:
List files in current working directory:
What do the transcript names look like in gencode.vM30.transcripts.fa.gz:
What would they look like without everything after the first
Above looks okay. Let's do it and save the output to a file:
Let's double check to see if what we did, was what we wanted:
Above looks good. (This might not be necessary, but...) Let's
And list the directory again:
Print working directory:
And use the full directory paths in nf-core/rnaseq (making sure to use the one that was just "fixed": gencode.vM30.transcripts_fixed.fa.gz)
So the flag directories in the nf-core/rnaseq should look like (or using these within the
This worked! I stopped getting the same error that @andreas-wilm was getting in the OP here. |
Fixed in #875 |
Description of the bug
Hi all,
I've downloaded references from Gencode (i.e. genome, transcript, annotation) and used nf-core/rnaseq with
--aligner star_salmon --pseudo_aligner salmon --gencode
. Unfortunately the STAR quantification with salmon fails, because of the GENCODE typical pipe separation of transcript names. See below for the error. The--gencode
flag is and can only be used for thesalmon index
step, if I understand correctly, but it doesn't help withsalmon quant
(see also here) on the STAR aligned BAM file.Command used and terminal output
This transcript and all other are part of the transcripts.fa, but in the Gencode typical format:
The text was updated successfully, but these errors were encountered: