-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Marco Reverenna edited this page Feb 14, 2024
·
3 revisions
This section is dedicated to sharing doubts, questions and considerations about the project.
Question:
- Which variables should we consider to filter our data?
- Should we consider the "pipeline version"? if so, which one? from 4.1 and exclude 1.0, 2.0 and 3.0? Why?
- Should we condire the "minimum number of samples"? How many samples should have one study? Why?
- Should we consider a specific "technology" instead of another one? Why? Which technology fir better with tools used in Albert's pipeline?
Feedback:
- "pipeline version", "n_samples","technology"
- pipelines version 4.1 and 5.0 contain more informations
- exclude all the studies which contain only 1 single sample, consider >= 6 samples (still not enough for a good study maybe)
- ask to Albert
- Applied filters for studies: minimum of 5 samples and the latest pipeline versions (4.0, 4.1, and 5.0).
- Considering these filters the total number of studies is 17 (6 metagenomics, 1 metatranscriptomics and 10 assembly) and the pipeline versions considered are just 4.0 and 5.1
- total number of ERR id is 491
- total number of ERZ id is 433 (ERZ comes just from assembly):
- Is there any way to get FASTQ using this kind of ID? (ERZ --> ERS --> SAMEA --> FASTA)
- Using assembly we can get only FASTA files which are non allowed for nf-metagenomics pipeline
- should we consider the research center and the technology to normalise as much as possible different studies?