IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022

Virtual (all paper presentations)

22-27 May 2022

Main Venue: Marina Bay Sands Expo & Convention Center, Singapore

27-28 October 2022

Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022

SPE-70.3

Zero-shot Cross-lingual Transfer using multi-stream encoder and efficient speaker representation

Yibin Zheng, Zewang Zhang, Xinhui Li, Wenchao Su, Li Lu, Tencent Inc, China, China

Session:

Speech Synthesis: Multi-lingual and Multimodal Synthesis

Location:

Gather Area D

Presentation Time:

Thu, 12 May, 21:00 - 21:45 China Time (UTC +8)
Thu, 12 May, 13:00 - 13:45 UTC

Session Chair:

Erica Cooper, National Institute of Informatics, Japan

Resources

View Manuscript

Session SPE-70

SPE-70.1: MULTILINGUAL TEXT-TO-SPEECH TRAINING USING CROSS LANGUAGE VOICE CONVERSION AND SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS

Jilong Wu, Adam Polyak, Yaniv Taigman, Prabhav Agrawal, Qing He, Facebook, United States of America; Jason Fong, The University of Edinburgh, United Kingdom of Great Britain and Northern Ireland

SPE-70.2: TOWARDS LIFELONG LEARNING OF MULTILINGUAL TEXT-TO-SPEECH SYNTHESIS

Mu Yang, University of Texas at Dallas, United States of America; Shaojin Ding, Texas A&M University, United States of America; Tianlong Chen, Zhangyang Wang, University of Texas at Austin, United States of America; Tong Wang, University of Science and Technology of China, China

SPE-70.3: Zero-shot Cross-lingual Transfer using multi-stream encoder and efficient speaker representation

Yibin Zheng, Zewang Zhang, Xinhui Li, Wenchao Su, Li Lu, Tencent Inc, China, China

SPE-70.4: VISUALTTS: TTS WITH ACCURATE LIP-SPEECH SYNCHRONIZATION FOR AUTOMATIC VOICE OVER

Junchen Lu, Mingyang Zhang, Haizhou Li, National University of Singapore, Singapore; Berrak Sisman, Singapore University of Technology and Design, Singapore; Rui Liu, National University of Singapore and Singapore University of Technology and Design, Singapore

SPE-70.5: DURATION MODELING OF NEURAL TTS FOR AUTOMATIC DUBBING

Johanes Effendi, Yogesh Virkar, Roberto Barra-Chicote, Marcello Federico, Amazon AI, Japan

SPE-70.6: LEARNING TO PREDICT SPEECH IN SILENT VIDEOS VIA AUDIOVISUAL ANALOGY

Ravindra Yadav, Vinay Namboodiri, Rajesh Hegde, Indian Institute of Technology Kanpur, India; Ashish Sardana, NVIDIA, India

Contact | Accessibility | Nondiscrimination Policy | IEEE Ethics Reporting | IEEE Privacy Policy | Terms | Signal Processing Society

©2026 IEEE – All rights reserved.

Last updated Last updated 21 May 2022.

Use of this website signifies your agreement to the IEEE Terms and Conditions.

Support: webmaster@2022.ieeeicassp.org Host: https://cmsworldwide.com/