SPE-43.4
TRANSFORMER-S2A: ROBUST AND EFFICIENT SPEECH-TO-ANIMATION
Liyang Chen, Zhiyong Wu, Shenzhen International Graduate School, China; Jun Ling, Shanghai Jiao Tong University, China; Runnan Li, Xu Tan, Sheng Zhao, Microsoft, China, China
Session:
Speech Synthesis: Singing Voice and Others
Track:
Speech and Language Processing
Location:
Gather Area D
Presentation Time:
Tue, 10 May, 22:00 - 22:45 China Time (UTC +8)
Tue, 10 May, 14:00 - 14:45 UTC
Tue, 10 May, 14:00 - 14:45 UTC
Session Chair:
Juhan Nam, KAIST
Session SPE-43
SPE-43.1: HIFIDENOISE: HIGH-FIDELITY DENOISING TEXT TO SPEECH WITH ADVERSARIAL NETWORKS
Lichao Zhang, Yi Ren, Zhou Zhao, Zhejiang University, China; Liqun Deng, Huawei Noah’s Ark Lab, China
SPE-43.2: VISINGER: VARIATIONAL INFERENCE WITH ADVERSARIAL LEARNING FOR END-TO-END SINGING VOICE SYNTHESIS
Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Northwestern Polytechnical University, China; Pengcheng Zhu, Mengxiao Bi, Fuxi AI Lab, NetEase Inc., China
SPE-43.3: A MELODY-UNSUPERVISION MODEL FOR SINGING VOICE SYNTHESIS
Soonbeom Choi, Juhan Nam, Korea Advanced Institute of Science and Technology, Korea, Republic of
SPE-43.4: TRANSFORMER-S2A: ROBUST AND EFFICIENT SPEECH-TO-ANIMATION
Liyang Chen, Zhiyong Wu, Shenzhen International Graduate School, China; Jun Ling, Shanghai Jiao Tong University, China; Runnan Li, Xu Tan, Sheng Zhao, Microsoft, China, China
SPE-43.5: VCVTS: MULTI-SPEAKER VIDEO-TO-SPEECH SYNTHESIS VIA CROSS-MODAL KNOWLEDGE TRANSFER FROM VOICE CONVERSION
Disong Wang, Xunying Liu, Helen Meng, The Chinese University of Hong Kong, Hong Kong; Shan Yang, Dan Su, Dong Yu, Tencent AI Lab, China