IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
27-28 October 2022
  • Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022
SPE-L3: Speech Synthesis
Wed, 25 May, 13:00 - 15:30 China Time (UTC +8)
Wed, 25 May, 05:00 - 07:30 UTC
Location: Simpor Junior Ballroom 4811-3
Session Co-Chairs: Yanfeng Lu, Institute for Infocomm Research and Sean Matthew Shannon, Google Research
Track: Speech and Language Processing

SPE-L3.1: NEURAL HMMS ARE ALL YOU NEED (FOR HIGH-QUALITY ATTENTION-FREE TTS)

Shivam Mehta, Éva Székely, Jonas Beskow, Gustav Eje Henter, KTH Royal Institute of Technology, Sweden

SPE-L3.2: VarianceFlow: High-quality and Controllable Text-to-Speech Using Variance Information via Normalizing Flow

Yoonhyung Lee, Kyomin Jung, Seoul National University, Korea, Republic of; Jinhyeok Yang, NCSOFT, Korea, Republic of

SPE-L3.3: SPEAKER GENERATION

Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, David Kao, Google, United States of America

SPE-L3.4: VISUALTTS: TTS WITH ACCURATE LIP-SPEECH SYNCHRONIZATION FOR AUTOMATIC VOICE OVER

Junchen Lu, Mingyang Zhang, Haizhou Li, National University of Singapore, Singapore; Berrak Sisman, Singapore University of Technology and Design, Singapore; Rui Liu, National University of Singapore and Singapore University of Technology and Design, Singapore

SPE-L3.5: A MELODY-UNSUPERVISION MODEL FOR SINGING VOICE SYNTHESIS

Soonbeom Choi, Juhan Nam, Korea Advanced Institute of Science and Technology, Korea, Republic of

SPE-L3.6: INCREMENTAL TEXT-TO-SPEECH SYNTHESIS USING PSEUDO LOOKAHEAD WITH LARGE PRETRAINED LANGUAGE MODEL

Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari, The University of Tokyo, Japan

SPE-L3.7: IMPROVING PHONETIC REALIZATIONS IN TTS BY USING PHONEME-ALIGNED GRAPHEMES

Manish Sharma, Yizhi Hong, Emily Kaplan, Siamak Tazari, Rob Clark, Google, United Kingdom of Great Britain and Northern Ireland

SPE-L3.8: WAVEBENDER GAN: AN ARCHITECTURE FOR PHONETICALLY MEANINGFUL SPEECH MANIPULATION

Gustavo Teodoro Döhler Beck, Ulme Wennberg, Zofia Malisz, Gustav Eje Henter, KTH Royal Institute of Technology, Sweden

SPE-L3.9: PERCEPTUAL-SIMILARITY-AWARE DEEP SPEAKER REPRESENTATION LEARNING FOR MULTI-SPEAKER GENERATIVE MODELING

Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, The University of Tokyo, Japan

SPE-L3.10: ONE-CLASS LEARNING TOWARDS SYNTHETIC VOICE SPOOFING DETECTION

You Zhang, Fei Jiang, Zhiyao Duan, University of Rochester, United States of America