Login Paper Search My Schedule Paper Index Help

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #4
Session:Voice Conversion and TTS
Session Time:Friday, December 21, 10:00 - 12:00
Presentation Time:Friday, December 21, 10:00 - 12:00
Presentation: Poster
Topic: Special session on Speech Synthesis:
Paper Title: NEURAL TTS VOICE CONVERSION
Authors: Zvi Kons; IBM Research 
 Slava Shechtman; IBM Research 
 Alex Sorin; IBM Research 
 Ron Hoory; IBM Research 
 Carmel Rabinovitz; IBM Research 
 Edmilson Da Silva Morais; IBM Research 
Abstract: Recently, speaker adaptation of neural TTS models received significant interest, and several studies focusing on this topic have been published. All of them explore an adaptation of an initial multi-speaker model trained on a corpus containing from tens to hundreds of individual speaker voices. In this work we focus on a challenging task of TTS voice conversion where an initial system is trained on a single-speaker data and then need to be adapted to a variety of external speaker voices. The TTS voice conversion setup represents a very important use case. Transcribed multi-speaker datasets might be unavailable for many languages while any TTS technology provider is expected to have at least one suitable single-speaker dataset per supported language. We present a neural TTS system comprising separate prosody generator and synthesizer DNN models. The system is trained on a high quality proprietary male speaker dataset. We show that the system models can be converted to a variety of external male and female ordinary voices and an extremely expressive artist’s voice and present crowd-base subjective evaluation results.