Technical Program

Paper Detail

Presentation #	4
Session:	Voice Conversion and TTS
Location:	Kallirhoe Hall
Session Time:	Friday, December 21, 10:00 - 12:00
Presentation Time:	Friday, December 21, 10:00 - 12:00
Presentation:	Poster
Topic:	Special session on Speech Synthesis:
Paper Title:	NEURAL TTS VOICE CONVERSION
Authors:	Zvi Kons, Slava Shechtman, Alex Sorin, Ron Hoory, Carmel Rabinovitz, Edmilson Da Silva Morais, IBM Research, Israel
Abstract:	Recently, speaker adaptation of neural TTS models received significant interest, and several studies focusing on this topic have been published. All of them explore an adaptation of an initial multi-speaker model trained on a corpus containing from tens to hundreds of individual speaker voices. In this work we focus on a challenging task of TTS voice conversion where an initial system is trained on a single-speaker data and then need to be adapted to a variety of external speaker voices. The TTS voice conversion setup represents a very important use case. Transcribed multi-speaker datasets might be unavailable for many languages while any TTS technology provider is expected to have at least one suitable single-speaker dataset per supported language. We present a neural TTS system comprising separate prosody generator and synthesizer DNN models. The system is trained on a high quality proprietary male speaker dataset. We show that the system models can be converted to a variety of external male and female ordinary voices and an extremely expressive artist’s voice and present crowd-base subjective evaluation results.