SPE-11.6
SPEECHSPLIT2.0: UNSUPERVISED SPEECH DISENTANGLEMENT FOR VOICE CONVERSION WITHOUT TUNING AUTOENCODER BOTTLENECKS
Chak Ho Chan, Mark Hasegawa-Johnson, University of Illinois at Urbana-Champaign, United States of America; Kaizhi Qian, Yang Zhang, MIT-IBM Watson AI Lab, United States of America
Session:
Speech Synthesis: Style & Expressiveness
Track:
Speech and Language Processing
Location:
Gather Area D
Presentation Time:
Sun, 8 May, 22:00 - 22:45 China Time (UTC +8)
Sun, 8 May, 14:00 - 14:45 UTC
Sun, 8 May, 14:00 - 14:45 UTC
Session Chair:
Tomoki Toda, Nagoya University
Session SPE-11
SPE-11.1: REFEREE: TOWARDS REFERENCE-FREE CROSS-SPEAKER STYLE TRANSFER WITH LOW-QUALITY DATA FOR EXPRESSIVE SPEECH SYNTHESIS
Songxiang Liu, Shan Yang, Dan Su, Dong Yu, Tencent, China
SPE-11.2: PVAE-TTS: ADAPTIVE TEXT-TO-SPEECH VIA PROGRESSIVE STYLE ADAPTATION
Ji-Hyun Lee, Sang-Hoon Lee, Ji-Hoon Kim, Seong-Whan Lee, Korea University, Korea, Republic of
SPE-11.3: EMOQ-TTS: EMOTION INTENSITY QUANTIZATION FOR FINE-GRAINED CONTROLLABLE EMOTIONAL TEXT-TO-SPEECH
Chae-Bin Im, Sang-Hoon Lee, Seung-Bin Kim, Seong-Whan Lee, Korea University, Korea, Republic of
SPE-11.4: JOINT AND ADVERSARIAL TRAINING WITH ASR FOR EXPRESSIVE SPEECH SYNTHESIS
Kaili Zhang, Cheng Gong, Wenhuan Lu, Longbiao Wang, Jianguo Wei, Dawei Liu, Tianjin University, China
SPE-11.5: MSDTRON: A HIGH-CAPABILITY MULTI-SPEAKER SPEECH SYNTHESIS SYSTEM FOR DIVERSE DATA USING CHARACTERISTIC INFORMATION
Qinghua Wu, Quanbo Shen, Jian Luan, Yujun Wang, Xiaomi Company, China
SPE-11.6: SPEECHSPLIT2.0: UNSUPERVISED SPEECH DISENTANGLEMENT FOR VOICE CONVERSION WITHOUT TUNING AUTOENCODER BOTTLENECKS
Chak Ho Chan, Mark Hasegawa-Johnson, University of Illinois at Urbana-Champaign, United States of America; Kaizhi Qian, Yang Zhang, MIT-IBM Watson AI Lab, United States of America