My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #	4
Session:	Voice Conversion and TTS
Session Time:	Friday, December 21, 10:00 - 12:00
Presentation Time:	Friday, December 21, 10:00 - 12:00
Presentation:	Poster
Topic:	Special session on Speech Synthesis:
Paper Title:	NEURAL TTS VOICE CONVERSION
Authors:	Zvi Kons; IBM Research
	Slava Shechtman; IBM Research
	Alex Sorin; IBM Research
	Ron Hoory; IBM Research
	Carmel Rabinovitz; IBM Research
	Edmilson Da Silva Morais; IBM Research
Abstract:	Recently, speaker adaptation of neural TTS models received significant interest, and several studies focusing on this topic have been published. All of them explore an adaptation of an initial multi-speaker model trained on a corpus containing from tens to hundreds of individual speaker voices. In this work we focus on a challenging task of TTS voice conversion where an initial system is trained on a single-speaker data and then need to be adapted to a variety of external speaker voices. The TTS voice conversion setup represents a very important use case. Transcribed multi-speaker datasets might be unavailable for many languages while any TTS technology provider is expected to have at least one suitable single-speaker dataset per supported language. We present a neural TTS system comprising separate prosody generator and synthesizer DNN models. The system is trained on a high quality proprietary male speaker dataset. We show that the system models can be converted to a variety of external male and female ordinary voices and an extremely expressive artist’s voice and present crowd-base subjective evaluation results.