-
Notifications
You must be signed in to change notification settings - Fork 27
TTS Research
seedhartha edited this page Jul 25, 2021
·
191 revisions
Name | Speakers | Dataset |
---|---|---|
LJS | Single (female) | |
LibriTTS | Around 100 | Clean |
LibriTTS2K | Around 2000 | Various |
filelists/libritts_speakerinfo.txt contains detailed information on each speaker.
40 78 83 87 118 125 196 200 250 254 374 405 446 460 587 669 696 730 831 887 1069 1088 1116 1246 1263
1502 1578 1841 1867 1963 1970 2092 2136 2182 2196 2289 2416 2436 2836 2843 2911 2952 3240 3242 3259
3436 3486 3526 3664 3857 3879 3982 3983 4018 4051 4088 4160 4195 4267 4297 4362 4397 4406 4640 4680
4788 5022 5104 5322 5339 5393 5652 5678 5703 5750 5808 6019 6064 6078 6081 6147 6181 6209 6272 6367
6385 6415 6437 6454 6476 6529 6818 6836 6848 7059 7067 7078 7178 7190 7226 7278 7302 7367 7402 7447
7505 7511 7794 7800 8051 8088 8098 8108 8123 8238 8312 8324 8419 8468 8609 8629 8770 8838
- Clone Flowtron repo:
git clone https://github.com/NVIDIA/flowtron.git
- CD into it:
cd flowtron
- Initialize submodules:
git submodule update --init; cd tacotron2; git submodule update --init
- Install PyTorch:
pip3 install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
- Install Flowtron requirements:
pip install -r requirements.txt
- Downgrade numba to 0.48:
pip install numba==0.48
- Install tensorboard:
pip install tensorboard
- Copy scripts/infer_from_model.patch into Flowtron directory and apply it:
git apply infer_from_model.patch
- Download pre-trained models and place them under flowtron/models:
- WaveGlow V4
- Flowtron LibriTTS2K
- Flowtron LibriTTS (optional)
- Flowtron LJS (optional)
- Edit config.json:
- Set
data_config.training_files
anddata_config.validation_files
to point to the model filelists - Set
model_config.n_speakers
to a number of speakers in the model
- Set
python inference.py -c config.json -w models/waveglow_256channels_universal_v4.pt -f MODEL_PATH -t "Hello, world!" -i SPEAKER_ID -s 0.5 -n 4000
Notes:
- MODEL_PATH is a path to either a downloaded model (e.g.,
models/flowtron_libritts2p3k.pt
), or the model you've trained (e.g.,outdir/model_10000
) - SPEAKER_ID must reference a valid speaker within the model
- Argument
-s
sets speech variation - increase to add expressiveness - Argument
-n
sets the buffer size - increase to infer longer texts - Resulting WAV file will be saved under flowtron/results
Begin with picking a model and a speaker that resemble your character the most. You can do that by sequentially inferring voice samples using different models and speakers, and comparing the results.
- Extract resources to a separate directory, and convert DLG and TLK files to JSON using scripts/extract.py
- Generate Flowtron filelists for a particular character using scripts/flowtron/filelists.py:
python filelists.py DLG_SPEAKER [DLG_VORESREF] [SPEAKER_ID]
, e.g.python filelists.py bastila bast 0
- Unwrap audio files based on generated filelists using scripts/flowtron/unwrap.py:
python unwrap.py FILELIST_PATH
, e.g.python unwrap.py bastila_train_filelist.txt
- Batch import unwrapped MP3 files into Audacity, resample to 22050 and export them as Signed 16-bit PCM WAV
- Place generated filelists under flowtron/filelists and edit sound paths to point to resampled audio files
Note: refer to Tooling for information on running Python scripts.
- Edit flowtron/config.json:
- Set
train_config.output_directory
to the directory in which to save trained models - Set
train_config.batch_size
to a lower number depending on your GPU power - Set
data_config.training_files
anddata_config.validation_files
to point to generated filelists - Set
model_config.n_speakers
to a number of speakers in the model
- Set
- Fine-tune a model:
python train.py -c config.json -p data_config.use_attn_prior=0 train_config.warmstart_checkpoint_path=MODEL_PATH train_config.finetune_layers=["speaker_embedding.weight"]
Note: when fine-tuning a model, validation loss tends to plateau at around 10,000 iterations
It is a good idea to monitor your progress while training a model.
- Run
tensorboard --logdir=outdir/logs
from the command line - Open http://localhost:6006 in a browser
- Pay attention to the validation loss chart under the Scalars tab - it must be approaching -1. Rule of thumb is to stop training when validation loss plateaus.