Presentation # | 8 |
Session: | Voice Conversion and TTS |
Session Time: | Friday, December 21, 10:00 - 12:00 |
Presentation Time: | Friday, December 21, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech recognition and synthesis: |
Paper Title: |
DATA SELECTION FOR IMPROVING NATURALNESS OF TTS VOICES TRAINED ON SMALL FOUND CORPUSES |
Authors: |
Fang-Yu Kuo; ObEN, Inc. | | |
| Sandesh Aryal; ObEN, Inc. | | |
| Gilles Degottex; ObEN, Inc. | | |
| Sam Kang; ObEN, Inc. | | |
| Pierre Lanchantin; ObEN, Inc. | | |
| Iris Ouyang; ObEN, Inc. | | |
Abstract: |
This work investigates techniques that select training data from small, found corpuses in order to improve the naturalness of synthesized text-to-speech voices. The approach outlined in this paper examines different metrics to detect and reject segments of training data that can degrade the performance of the system. We conducted experiments on two small datasets extracted from Mandarin Chinese audiobooks that have different characteristics in terms of recording conditions, narrator, and transcriptions. We show that using a even smaller, yet carefully selected, set of data can lead to a text-to-speech system able to generate more natural speech than a system trained on the complete dataset. Three metrics related to the narrator's articulation proposed in the paper give significant improvements in naturalness. |