IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
27-28 October 2022
  • Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022
SPE-70: Speech Synthesis: Multi-lingual and Multimodal Synthesis
Thu, 12 May, 21:00 - 21:45 China Time (UTC +8)
Thu, 12 May, 13:00 - 13:45 UTC
Location: Gather Area D
Session Chair: Erica Cooper, National Institute of Informatics, Japan
Track: Speech and Language Processing

SPE-70.1: MULTILINGUAL TEXT-TO-SPEECH TRAINING USING CROSS LANGUAGE VOICE CONVERSION AND SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS

Jilong Wu, Adam Polyak, Yaniv Taigman, Prabhav Agrawal, Qing He, Facebook, United States of America; Jason Fong, The University of Edinburgh, United Kingdom of Great Britain and Northern Ireland

SPE-70.2: TOWARDS LIFELONG LEARNING OF MULTILINGUAL TEXT-TO-SPEECH SYNTHESIS

Mu Yang, University of Texas at Dallas, United States of America; Shaojin Ding, Texas A&M University, United States of America; Tianlong Chen, Zhangyang Wang, University of Texas at Austin, United States of America; Tong Wang, University of Science and Technology of China, China

SPE-70.3: Zero-shot Cross-lingual Transfer using multi-stream encoder and efficient speaker representation

Yibin Zheng, Zewang Zhang, Xinhui Li, Wenchao Su, Li Lu, Tencent Inc, China, China

SPE-70.4: VISUALTTS: TTS WITH ACCURATE LIP-SPEECH SYNCHRONIZATION FOR AUTOMATIC VOICE OVER

Junchen Lu, Mingyang Zhang, Haizhou Li, National University of Singapore, Singapore; Berrak Sisman, Singapore University of Technology and Design, Singapore; Rui Liu, National University of Singapore and Singapore University of Technology and Design, Singapore

SPE-70.5: DURATION MODELING OF NEURAL TTS FOR AUTOMATIC DUBBING

Johanes Effendi, Yogesh Virkar, Roberto Barra-Chicote, Marcello Federico, Amazon AI, Japan

SPE-70.6: LEARNING TO PREDICT SPEECH IN SILENT VIDEOS VIA AUDIOVISUAL ANALOGY

Ravindra Yadav, Vinay Namboodiri, Rajesh Hegde, Indian Institute of Technology Kanpur, India; Ashish Sardana, NVIDIA, India