Skip to content

TTS Research

seedhartha edited this page Jul 25, 2021 · 191 revisions

Requirements

  • NVIDIA GPU
  • Python 3.8
  • Flowtron
  • PyTorch
  • Audacity
  • Git
  • reone-tools (KotOR only)

Flowtron Models

Name Speakers Dataset
LJS Single (female)
LibriTTS Around 100 Clean
LibriTTS2K Around 2000 Various

LibriTTS Speakers

filelists/libritts_speakerinfo.txt contains detailed information on each speaker.

40 78 83 87 118 125 196 200 250 254 374 405 446 460 587 669 696 730 831 887 1069 1088 1116 1246 1263
 1502 1578 1841 1867 1963 1970 2092 2136 2182 2196 2289 2416 2436 2836 2843 2911 2952 3240 3242 3259
 3436 3486 3526 3664 3857 3879 3982 3983 4018 4051 4088 4160 4195 4267 4297 4362 4397 4406 4640 4680
 4788 5022 5104 5322 5339 5393 5652 5678 5703 5750 5808 6019 6064 6078 6081 6147 6181 6209 6272 6367
 6385 6415 6437 6454 6476 6529 6818 6836 6848 7059 7067 7078 7178 7190 7226 7278 7302 7367 7402 7447
 7505 7511 7794 7800 8051 8088 8098 8108 8123 8238 8312 8324 8419 8468 8609 8629 8770 8838

Setup

  1. Clone Flowtron repo: git clone https://github.com/NVIDIA/flowtron.git
  2. CD into it: cd flowtron
  3. Initialize submodules: git submodule update --init; cd tacotron2; git submodule update --init
  4. Install PyTorch: pip3 install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
  5. Install Flowtron requirements: pip install -r requirements.txt
  6. Downgrade numba to 0.48: pip install numba==0.48
  7. Install tensorboard: pip install tensorboard
  8. Copy scripts/infer_from_model.patch into Flowtron directory and apply it: git apply infer_from_model.patch
  9. Download pre-trained models and place them under flowtron/models:

Inference

  1. Edit config.json:
    1. Set data_config.training_files and data_config.validation_files to point to the model filelists
    2. Set model_config.n_speakers to a number of speakers in the model
  2. python inference.py -c config.json -w models/waveglow_256channels_universal_v4.pt -f MODEL_PATH -t "Hello, world!" -i SPEAKER_ID -s 0.5 -n 4000

Notes:

  • MODEL_PATH is a path to either a downloaded model (e.g., models/flowtron_libritts2p3k.pt), or the model you've trained (e.g., outdir/model_10000)
  • SPEAKER_ID must reference a valid speaker within the model
  • Argument -s sets speech variation - increase to add expressiveness
  • Argument -n sets the buffer size - increase to infer longer texts
  • Resulting WAV file will be saved under flowtron/results

Custom Model

Begin with picking a model and a speaker that resemble your character the most. You can do that by sequentially inferring voice samples using different models and speakers, and comparing the results.

Data Preparation (KotOR)

  1. Extract resources to a separate directory, and convert DLG and TLK files to JSON using scripts/extract.py
  2. Generate Flowtron filelists for a particular character using scripts/flowtron/filelists.py: python filelists.py DLG_SPEAKER [DLG_VORESREF] [SPEAKER_ID], e.g. python filelists.py bastila bast 0
  3. Unwrap audio files based on generated filelists using scripts/flowtron/unwrap.py: python unwrap.py FILELIST_PATH, e.g. python unwrap.py bastila_train_filelist.txt
  4. Batch import unwrapped MP3 files into Audacity, resample to 22050 and export them as Signed 16-bit PCM WAV
  5. Place generated filelists under flowtron/filelists and edit sound paths to point to resampled audio files

Note: refer to Tooling for information on running Python scripts.

Training

  1. Edit flowtron/config.json:
    1. Set train_config.output_directory to the directory in which to save trained models
    2. Set train_config.batch_size to a lower number depending on your GPU power
    3. Set data_config.training_files and data_config.validation_files to point to generated filelists
    4. Set model_config.n_speakers to a number of speakers in the model
  2. Fine-tune a model: python train.py -c config.json -p data_config.use_attn_prior=0 train_config.warmstart_checkpoint_path=MODEL_PATH train_config.finetune_layers=["speaker_embedding.weight"]

Note: when fine-tuning a model, validation loss tends to plateau at around 10,000 iterations

Monitoring

It is a good idea to monitor your progress while training a model.

  1. Run tensorboard --logdir=outdir/logs from the command line
  2. Open http://localhost:6006 in a browser
  3. Pay attention to the validation loss chart under the Scalars tab - it must be approaching -1. Rule of thumb is to stop training when validation loss plateaus.