SPE-80.1
IMPROVING EMOTIONAL SPEECH SYNTHESIS BY USING SUS-CONSTRAINED VAE AND TEXT ENCODER AGGREGATION
Fengyu Yang, Jian Luan, Yujun Wang, Xiaomi Corporation, China
Session:
Speech Synthesis: Expressiveness and Voice Cloning
Track:
Speech and Language Processing
Location:
Gather Area D
Presentation Time:
Fri, 13 May, 20:00 - 20:45 China Time (UTC +8)
Fri, 13 May, 12:00 - 12:45 UTC
Fri, 13 May, 12:00 - 12:45 UTC
Session Chair:
Zhen-Hua Ling, University of Science and Technology of China
Session SPE-80
SPE-80.1: IMPROVING EMOTIONAL SPEECH SYNTHESIS BY USING SUS-CONSTRAINED VAE AND TEXT ENCODER AGGREGATION
Fengyu Yang, Jian Luan, Yujun Wang, Xiaomi Corporation, China
SPE-80.2: DISTRIBUTION AUGMENTATION FOR LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH
Mateusz Lajszczak, Animesh Prasad, Arent van Korlaar, Bajibabu Bollepalli, Antonio Bonafonte, Arnaud Joly, Marco Nicolis, Alexis Moinet, Thomas Drugman, Trevor Wood, Elena Sokolova, Amazon, United Kingdom of Great Britain and Northern Ireland
SPE-80.3: INTERACTIVE MULTI-LEVEL PROSODY CONTROL FOR EXPRESSIVE SPEECH SYNTHESIS
Tobias Cornille, Jessa Bekker, KU Leuven, Belgium; Fengna Wang, Acapela-Group, Belgium
SPE-80.4: IMPROVE FEW-SHOT VOICE CLONING USING MULTI-MODAL LEARNING
Haitong Zhang, Yue Lin, Netease Games AI Lab, China
SPE-80.5: Cloning one's voice using very limited data in the wild
Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang, bytedance, China
SPE-80.6: UNET-TTS: IMPROVING UNSEEN SPEAKER AND STYLE TRANSFER IN ONE-SHOT VOICE CLONING
Rui Li, Dong Pu, Minnie Huang, Bill Huang, CloudMinds, China