-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation: mention prokaryotic analysis settings #765
Comments
Thanks @d4straub ! Yes, I agree we should improve and extend the documentation to include some description of how to tweak the parameters given a selection of common annotations. Maybe something we can tackle at the Hackathon? I haven't used prokaryotic annotations personally in ages but if you are able to get the latest version of the pipeline running with these parameters and have some recommendations then that would be great. In recent versions of the pipeline featureCounts is just used to generate the biotype QC as mentioned in the docs and isn't used for any formal quantification. In most cases where you aren't using a standard annotation it is almost easier to use |
Hello @drpatelh, I mainly work with prokaryotes and struggled with similar problems as Daniel but I could actually solve them in some cases and executed the newest version of rnaseq yesterday that worked rather nicely. |
Because I havent used the newer (3.x) versions of the pipeline, most of my experience might be outdated. Why not go ahead with your experience with the current version and write it down? If over time someone has more information to add, than this can be always amended. Once I use a recent pipeline version I'll have a look, but it might take some time until I have my next bacterial RNA-Seq project going. |
Of course I can do this but where should I actual write it? I'm familiar with the execution and documentations but not with the code or where to add my suggestions. Can you help me with that? |
I thought somewhere around https://nf-co.re/rnaseq/usage#running-the-pipeline might be a good place, or probably below https://nf-co.re/rnaseq/usage#full-samplesheet a new paragraph? |
Thanks, that helps. |
Yep, a new section in the usage docs would be fab. Maybe we can add a section after this one specifically for |
Added paragraph about the usage of rnaseq with prokaryotic data based on Issue #765
Description of feature
First of all, we are mostly still running version 1.4.2 at our facility (I know, ancient; will be changed soon).
The default settings for 1.4.2
params.fc_count_type
isexon
. This is perfect for eukaryotes I assume, but not for most prokaryote genomes, because those are typically not spliced and (at least NCBI gff, in my experience) usually rather usesCDS
ortranscript
. Meaning, running the pipeline with standard settings on prokaryotes will only output very few features (those fewexon
), in a recent case 43 of ~2300 transcripts.The solution could be to add a small section in the documentation and/or do a sanity check counting the features compared to the total line count in the gtf.
Having said that, I am not sure how newer versions of the pipeline would handle this case. I assume that -featurecounts_feature_type has a similar function.
Additionally,
--featurecounts_group_type
usually does not appear in prokaryotic genomes and crashes the pipeline (at least 1.4.2), it would be nice if this could be handled more gracefully (again, maybe improved in newer versions).The text was updated successfully, but these errors were encountered: