The official codes for our paper: for "DistillW2N: A Lightweight One-Shot Whisper to Normal Voice Conversion Model Using Distillation of Self-Supervised Features" (DistillW2N), which is accepted by ICASSP2025.
- Create a Python environment with e.g. conda:
conda create --name distillw2n python=3.10.12 --yes
- Activate the new environment:
conda activate distillw2n
- Install torch and torchaudio:
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
- Update the packages:
sudo apt-get update && apt-get install -y libsndfile1 ffmpeg
- Install requirements with
pip install -r requirements.txt
- Download models with links given in txt
- For quickvc and wesper please run:
python compare_infer.py
- For our models please run:
python infer.py
- Please run:
python u2ss2u.py
You just need to download the datasets under YOURPATH
.
- Dataset Download
- For the libritts, ljspeech, and timit datasets, datahelper will automatically download if they are not found at
YOURPATH
. - For the wtimit dataset, you will need to request it via email. Follow the appropriate procedures to obtain access and download the dataset to
YOURPATH
.
- For the libritts, ljspeech, and timit datasets, datahelper will automatically download if they are not found at
- Dataset Preparation (Option)
- datapreper offers options for ppw (Pseudo-whisper) and vad (Voice Activity Detection) versions. You can choose to apply these processing steps according to your project's requirements.
This implementation builds on
- SoundStream for the training pipeline.