Early Detection of Novel SARS-CoV-2 Variants from Urban and Rural Wastewater through Genome Sequencing and Machine Learning
- MATLAB: https://www.mathworks.com/
- ICAsso*: https://research.ics.aalto.fi/ica/icasso/
- FastICA*: https://research.ics.aalto.fi/ica/fastica/
*Under other_Dependence folder we have provided icasso122 and FastICA_25 packages for the purpose of testing the code. You MUST obtain proper usage permissions from the orignial authors.
- Running ICA
- Dual regression
- Annotate dual-regressed signal to known COVID strains
- Identify potential novel mutations
script_example_code_test_all_steps.m provides an example of running all four steps with sample_dataset\covid_test_data.mat
A typical running time would be less than 5 minutes using this test data on a "normal" desktop.
The expected outcomes are under sample_dataset\expected_output\
Results reported in Zhuang et al., 2024 were produced using sample_dataset\Yf_all_ivar_variant_sep21_nov23_50xcoverage_gt_80_vRefine_vNoDup_github.mat
Follow instructions under: other_Dependence\prepare_VoC_references_for_annotation
Our codes were tested using MATLAB R2023b and R2022b. Under different MATLAB versions, slight debugging might be needed due to changes in MATLAB in-built functions.