IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
27-28 October 2022
  • Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022
SPE-70.5

DURATION MODELING OF NEURAL TTS FOR AUTOMATIC DUBBING

Johanes Effendi, Yogesh Virkar, Roberto Barra-Chicote, Marcello Federico, Amazon AI, Japan

Session:
Speech Synthesis: Multi-lingual and Multimodal Synthesis

Track:
Speech and Language Processing

Location:
Gather Area D

Presentation Time:
Thu, 12 May, 21:00 - 21:45 China Time (UTC +8)
Thu, 12 May, 13:00 - 13:45 UTC

Session Chair:
Erica Cooper, National Institute of Informatics, Japan
Presentation
Discussion
Resources
Session SPE-70
SPE-70.1: MULTILINGUAL TEXT-TO-SPEECH TRAINING USING CROSS LANGUAGE VOICE CONVERSION AND SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS
Jilong Wu, Adam Polyak, Yaniv Taigman, Prabhav Agrawal, Qing He, Facebook, United States of America; Jason Fong, The University of Edinburgh, United Kingdom of Great Britain and Northern Ireland
SPE-70.2: TOWARDS LIFELONG LEARNING OF MULTILINGUAL TEXT-TO-SPEECH SYNTHESIS
Mu Yang, University of Texas at Dallas, United States of America; Shaojin Ding, Texas A&M University, United States of America; Tianlong Chen, Zhangyang Wang, University of Texas at Austin, United States of America; Tong Wang, University of Science and Technology of China, China
SPE-70.3: Zero-shot Cross-lingual Transfer using multi-stream encoder and efficient speaker representation
Yibin Zheng, Zewang Zhang, Xinhui Li, Wenchao Su, Li Lu, Tencent Inc, China, China
SPE-70.4: VISUALTTS: TTS WITH ACCURATE LIP-SPEECH SYNCHRONIZATION FOR AUTOMATIC VOICE OVER
Junchen Lu, Mingyang Zhang, Haizhou Li, National University of Singapore, Singapore; Berrak Sisman, Singapore University of Technology and Design, Singapore; Rui Liu, National University of Singapore and Singapore University of Technology and Design, Singapore
SPE-70.5: DURATION MODELING OF NEURAL TTS FOR AUTOMATIC DUBBING
Johanes Effendi, Yogesh Virkar, Roberto Barra-Chicote, Marcello Federico, Amazon AI, Japan
SPE-70.6: LEARNING TO PREDICT SPEECH IN SILENT VIDEOS VIA AUDIOVISUAL ANALOGY
Ravindra Yadav, Vinay Namboodiri, Rajesh Hegde, Indian Institute of Technology Kanpur, India; Ashish Sardana, NVIDIA, India