Samples of Multi-Speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Description
DNN: DNN-based multi-speaker TTS using one-hot speaker codes (reproduction of [N. Hojo et al., 2018]).
DGP: Deep Gaussian Processes (DGP)-based TTS using one-hot speaker codes (Proposed).
DGPLVM: Deep Gaussian Process Latent Variable Model (DGPLVM)-based TTS (Proposed). Speaker representation is jointly learned with acoustic model parameters.
Utterance is defined by following format --- {speaker_id}_VOICEACTRESS100_{utterance_id}