IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
27-28 October 2022
  • Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022
SPE-66.4

USING MULTIPLE REFERENCE AUDIOS AND STYLE EMBEDDING CONSTRAINTS FOR SPEECH SYNTHESIS

Cheng Gong, Longbiao Wang, Tianjin University, China; Zhenhua Ling, University of Science and Technology of China, China; Ju Zhang, Huiyan Technology (Tianjin) Co., Ltd, China; Jianwu Dang, Japan Advanced Institute of Science and Technology, Japan

Session:
Speech Synthesis: Style control, Transfer, and Adaptation

Track:
Speech and Language Processing

Location:
Gather Area D

Presentation Time:
Thu, 12 May, 20:00 - 20:45 China Time (UTC +8)
Thu, 12 May, 12:00 - 12:45 UTC

Session Chair:
Jiangyan Yi, Institute of Automation, CAS
Presentation
Discussion
Resources
Session SPE-66
SPE-66.1: SPEAKER GENERATION
Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, David Kao, Google, United States of America
SPE-66.2: VOICE FILTER: FEW-SHOT TEXT-TO-SPEECH SPEAKER ADAPTATION USING VOICE CONVERSION AS A POST-PROCESSING MODULE
Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba, Amazon, Poland; Chung-Ming Chien, National Taiwan University (NTU), United Kingdom of Great Britain and Northern Ireland
SPE-66.3: FINE-GRAINED STYLE CONTROL IN TRANSFORMER-BASED TEXT-TO-SPEECH SYNTHESIS
Li-Wei Chen, Alexander Rudnicky, Carnegie Mellon University, United States of America
SPE-66.4: USING MULTIPLE REFERENCE AUDIOS AND STYLE EMBEDDING CONSTRAINTS FOR SPEECH SYNTHESIS
Cheng Gong, Longbiao Wang, Tianjin University, China; Zhenhua Ling, University of Science and Technology of China, China; Ju Zhang, Huiyan Technology (Tianjin) Co., Ltd, China; Jianwu Dang, Japan Advanced Institute of Science and Technology, Japan
SPE-66.5: ENHANCING SPEAKING STYLES IN CONVERSATIONAL TEXT-TO-SPEECH SYNTHESIS WITH GRAPH-BASED MULTI-MODAL CONTEXT MODELING
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Tsinghua University, China; Helen Meng, The Chinese University of Hong Kong, China; Chao Weng, Dan Su, Tencent, China
SPE-66.6: TOWARDS EXPRESSIVE SPEAKING STYLE MODELLING WITH HIERARCHICAL CONTEXT INFORMATION FOR MANDARIN SPEECH SYNTHESIS
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Tsinghua University, China; Shiyin Kang, Huya Inc., China; Helen Meng, The Chinese University of Hong Kong, Hong Kong