SPE-51.5
VarianceFlow: High-quality and Controllable Text-to-Speech Using Variance Information via Normalizing Flow
Yoonhyung Lee, Kyomin Jung, Seoul National University, Korea, Republic of; Jinhyeok Yang, NCSOFT, Korea, Republic of
Session:
Speech Synthesis: Novel Acoustic Models
Track:
Speech and Language Processing
Location:
Gather Area D
Presentation Time:
Wed, 11 May, 20:00 - 20:45 China Time (UTC +8)
Wed, 11 May, 12:00 - 12:45 UTC
Wed, 11 May, 12:00 - 12:45 UTC
Session Chair:
Heiga Zen, Google
Session SPE-51
SPE-51.1: NEURAL HMMS ARE ALL YOU NEED (FOR HIGH-QUALITY ATTENTION-FREE TTS)
Shivam Mehta, Éva Székely, Jonas Beskow, Gustav Eje Henter, KTH Royal Institute of Technology, Sweden
SPE-51.2: AUTOREGRESSIVE VARIATIONAL AUTOENCODER WITH A HIDDEN SEMI-MARKOV MODEL-BASED STRUCTURED ATTENTION FOR SPEECH SYNTHESIS
Takato Fujimoto, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda, Nagoya Institute of Technology, Japan
SPE-51.3: PAMA-TTS: PROGRESSION-AWARE MONOTONIC ATTENTION FOR STABLE SEQ2SEQ TTS WITH ACCURATE PHONEME DURATION CONTROL
Yunchao He, Jian Luan, Yujun Wang, Xiaomi Corporation, China
SPE-51.4: IMPROVING FASTSPEECH TTS WITH EFFICIENT SELF-ATTENTION AND COMPACT FEED-FORWARD NETWORK
Yujia Xiao, Xi Wang, Lei He, Frank K. Soong, Microsoft, China
SPE-51.5: VarianceFlow: High-quality and Controllable Text-to-Speech Using Variance Information via Normalizing Flow
Yoonhyung Lee, Kyomin Jung, Seoul National University, Korea, Republic of; Jinhyeok Yang, NCSOFT, Korea, Republic of
SPE-51.6: MIXER-TTS: NON-AUTOREGRESSIVE, FAST AND COMPACT TEXT-TO-SPEECH MODEL CONDITIONED ON LANGUAGE MODEL EMBEDDINGS
Oktai Tatanov, Stanislav Beliaev, Boris Ginsburg, NVIDIA, Russian Federation