ww-poreCoV extension #275

hoelzer · 2024-07-31T06:36:09Z

I suggest using the poreCov pipeline as the backend for SARS-CoV-2 wastewater lineage deconvolution from nanopore long reads. You already added freyja ( #274 #270), which is great as the current community standard.

However, we are also interested in detecting new stuff, aka "cryptic lineages" or novel mutation profiles.

To do this, I would like to test/implement two recent approaches:

CONCOMPRA

https://github.com/willem-stock/CONCOMPRA
consensus approach for community profiling with nanopore amplicon sequencing data, focused on 16S rDNA
I already tested this on example nanopore data from mixed patient samples (simulating wastewater) and it looked very promising
should be easy to add as a single new process
challenge is that the tool only works on one primer pair (Usage of more than one primer pair willem-stock/CONCOMPRA#1)

Floria

https://github.com/bluenote-1577/floria | https://doi.org/10.1093/bioinformatics/btae252
Strain-level haplotyping for metagenomes with short or long reads.
I am curious how well this works on a) amplicon data and b) SARS-CoV-2
input is VCF and mapped reads (BAM): so I think should be also easy to add

By this, we would get known lineage abundances from freyja plus potential new lineages from one or both of the other tools.

Finally, we could also write a little ww-poreCoV extension paper ;)

The text was updated successfully, but these errors were encountered:

hoelzer · 2024-08-14T15:41:32Z

Another interesting method might be VirPool: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-05100-3

They illustrated already the advantage that longer amplicons bring w/ their method in particular.

is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read. A crucial property of our model is its ability to capture long-range dependencies within reads, which is particularly relevant when coupled with use of long amplicons and nanopore sequencing.

They also tested the method on real data by sequencing a mixture of eight clinical samples using long amplicons (2kb).

(sidenote, attention was brought up and information was shared by Victor! thx!)

The question is, as usually, how up-to-date is the tool?

https://github.com/fmfi-compbio/virpool

They also provide scripts to create own profiles: https://github.com/fmfi-compbio/virpool?tab=readme-ov-file#creating-a-custom-variant-profile

Maybe covsonar could also create these profiles easily. Or Ashkans sc2mfc tool.

replikation · 2024-08-14T16:27:16Z

Yep, it's completely fine to extend Porecov now on the wastewater surveillance. We just need to make sure the normal "user experience" is not convoluted.

MarieLataretu · 2024-08-15T09:09:08Z

ad Floria:

You would use the VCF + BAM form ARTIC, right? Just to keep in mind: mixed indels might be tricky.
I saw an overlapping x nt deletion and y nt deletion, and neither of both was called by medaka, and/or a frameshift introducing indel was called

edit: corrected tool name; Florida would be also a fun name

hoelzer · 2024-08-15T09:23:12Z

ad Florida:

You would use the VCF + BAM form ARTIC, right? Just to keep in mind: mixed indels might be tricky. I saw an overlapping x nt deletion and y nt deletion, and neither of both was called by medaka, and/or a frameshift introducing indel was called

Yes, I would like to use the output porecov anyway produces to change as little as possible.

However, good point. I would live with such issues for now. Deconvoluting lineages from wastewater is anyway wild west :) but of course, important to keep such situations in mind.

Ps: Floria ;)

replikation · 2024-08-15T10:35:02Z

Not sure if it is possible to implement a subcommand or a second "main.nf" solely for the wastewater part? (thinking samtools subcommand for instance)

MarieLataretu · 2024-08-15T12:00:52Z

I think subcommands are rather unusual - afaik, this is mainly handled by a parameter.
E.g. viralrecon has a platform parameter for either Illumina, or nanopore:
https://github.com/nf-core/viralrecon/blob/3731dd3a32a67a2648ea22c2bd980c224abdaee2/main.nf#L62-L76

Also, it could make trouble with the execution from GitHub nextflow run replikation/poreCov ... (or it needs some extra configuration)

hoelzer · 2024-08-21T14:49:10Z

Started working on this in a branch ww-porecov. First step was a container for CONCOMPRA which was already a bit pain, but made it finally

rkimf1/concompra:v0.0.1--f6c273d

hoelzer · 2024-10-24T09:28:07Z

The authors of CONCOMPRA provide a docker now: willemstock/concompra:version0.0.2

However, we need to see if that works w/ nextflow

hoelzer · 2024-11-14T09:47:08Z

This might be even more interesting to add as a process to ww-poreCov:

https://www.medrxiv.org/content/10.1101/2024.08.27.24312690v1
Unsupervised detection of SARS-CoV-2 mutations and lineages in Norwegian wastewater samples using long-read sequencing
https://github.com/garcia-nacho/HERCULES

One big questions is probably: how do they handle the reference used for lineage assignments at the end? And: can they detect something new/cryptic

hoelzer · 2024-12-16T13:29:05Z

Looking into HERCULES

they have a pre-build docker container but we might want our own to work w/ nextflow (its maybe a bit cumbersome to use otherwise)
they provide option for own reference, based on aligned spike genes:

To update the set of references that HERCULES uses, you need a fasta file with all the Spike-genes of the reference sequences aligned. The names of the sequences inside the fasta file must have the following structure IDXX_LineageY where IDXX is the unique identifier for the sequence and the LineageY is the lineage assigned to the sequence, it can be pangolin lineage or any other nomenclature system (e.g. nextclade clade ID, WHO nomenclature, etc). The fasta file must be stored in a folder that must be mounted inside HERCULES with the following flag. -v Path/to/Folder:/Reference

This might be a good option to build an up-to-date reference based on "representative spike sequences" per lineage (https://github.com/rki-mf1/sc2-mutation-frequency-calculator)

hoelzer · 2024-12-16T14:59:57Z

re-tagged the provided image to mhoelzer/hercules:0.0.0 bc the provided name (ghcr.io/garcia-nacho/hercules) crashes some process in poreCov

hoelzer · 2024-12-16T15:37:51Z

Unfortunately, the container is a pain to use in Nextflow. Here is the Dockerfile and how then the tool is executed (a bash script):

https://github.com/garcia-nacho/HERCULES/blob/master/Dockerfile#L76C19-L76C57

Probably, we need first to re-build the container. I tried to just docker build the original Dockerfile but this fails early (on my MacBook).

hoelzer · 2025-01-10T12:55:05Z

garcia-nacho/HERCULES#2 (comment)

What I would recommend you is to run HERCULES several times, one per amplicon, and to compare/integrate the results. The larger the amplicons the better (i.e., Midnight and ViralAmp protocols)

Hm okay.. similar issue like with CONCOMPRA

hoelzer · 2025-01-24T12:24:34Z

Is this maybe interesting for our purpose of detecting smt new?

https://github.com/zhuangx15/ICAvar
https://www.medrxiv.org/content/10.1101/2024.04.18.24306052v1.full

hoelzer · 2025-02-13T15:48:43Z

Another idea:

devider; https://www.biorxiv.org/content/10.1101/2024.11.05.621838v1

hoelzer · 2025-02-19T06:29:07Z

Another package from colleagues in France (Hughes has contact) where they reimplemented ww tools and making them faster

Also

... indeed the haplotype callers should do the task that you need. If you want to make full use of the long reads, you will need to start from the BAM and count the mutation cooccurrences. The tool from El Hacene DJAOUT does that: https://pypi.org/project/VaRaPS/1.0.0

hoelzer · 2025-02-19T06:43:20Z

Another interesting method might be VirPool: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-05100-3

They illustrated already the advantage that longer amplicons bring w/ their method in particular.

is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read. A crucial property of our model is its ability to capture long-range dependencies within reads, which is particularly relevant when coupled with use of long amplicons and nanopore sequencing.

They also tested the method on real data by sequencing a mixture of eight clinical samples using long amplicons (2kb).

(sidenote, attention was brought up and information was shared by Victor! thx!)

The question is, as usually, how up-to-date is the tool?

https://github.com/fmfi-compbio/virpool

They also provide scripts to create own profiles: https://github.com/fmfi-compbio/virpool?tab=readme-ov-file#creating-a-custom-variant-profile

Maybe covsonar could also create these profiles easily. Or Ashkans sc2mfc tool.

Virpool might be promising...

you have the choice between : Virpool (cooccurrence), Frejya (frequencies), and LCS (counts)

And the tool in the post before has a faster reimplementation of virpool

hoelzer · 2025-02-20T10:14:41Z

Thomas is testing: https://github.com/bluenote-1577/devider

hoelzer · 2025-02-25T12:56:56Z

That looks also promising, VILOCA:

https://academic.oup.com/nargab/article/6/4/lqae152/7912062

hoelzer added the enhancement New feature or request label Jul 31, 2024

hoelzer assigned hoelzer and replikation Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ww-poreCoV extension #275

ww-poreCoV extension #275

hoelzer commented Jul 31, 2024 •

edited

Loading

hoelzer commented Aug 14, 2024 •

edited

Loading

replikation commented Aug 14, 2024

MarieLataretu commented Aug 15, 2024 •

edited

Loading

hoelzer commented Aug 15, 2024 •

edited

Loading

replikation commented Aug 15, 2024

MarieLataretu commented Aug 15, 2024

hoelzer commented Aug 21, 2024 •

edited

Loading

hoelzer commented Oct 24, 2024

hoelzer commented Nov 14, 2024 •

edited

Loading

hoelzer commented Dec 16, 2024

hoelzer commented Dec 16, 2024

hoelzer commented Dec 16, 2024

hoelzer commented Jan 10, 2025

hoelzer commented Jan 24, 2025

hoelzer commented Feb 13, 2025

hoelzer commented Feb 19, 2025

hoelzer commented Feb 19, 2025

hoelzer commented Feb 20, 2025

hoelzer commented Feb 25, 2025

ww-poreCoV extension #275

ww-poreCoV extension #275

Comments

hoelzer commented Jul 31, 2024 • edited Loading

hoelzer commented Aug 14, 2024 • edited Loading

replikation commented Aug 14, 2024

MarieLataretu commented Aug 15, 2024 • edited Loading

hoelzer commented Aug 15, 2024 • edited Loading

replikation commented Aug 15, 2024

MarieLataretu commented Aug 15, 2024

hoelzer commented Aug 21, 2024 • edited Loading

hoelzer commented Oct 24, 2024

hoelzer commented Nov 14, 2024 • edited Loading

hoelzer commented Dec 16, 2024

hoelzer commented Dec 16, 2024

hoelzer commented Dec 16, 2024

hoelzer commented Jan 10, 2025

hoelzer commented Jan 24, 2025

hoelzer commented Feb 13, 2025

hoelzer commented Feb 19, 2025

hoelzer commented Feb 19, 2025

hoelzer commented Feb 20, 2025

hoelzer commented Feb 25, 2025

hoelzer commented Jul 31, 2024 •

edited

Loading

hoelzer commented Aug 14, 2024 •

edited

Loading

MarieLataretu commented Aug 15, 2024 •

edited

Loading

hoelzer commented Aug 15, 2024 •

edited

Loading

hoelzer commented Aug 21, 2024 •

edited

Loading

hoelzer commented Nov 14, 2024 •

edited

Loading