2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSS-11.6
Paper Title LIGHT-TTS: LIGHTWEIGHT MULTI-SPEAKER MULTI-LINGUAL TEXT-TO-SPEECH
Authors Song Li, Beibei Ouyang, Lin Li, Qingyang Hong, Xiamen University, China
SessionSS-11: On-device AI for Audio and Speech Applications
LocationGather.Town
Session Time:Thursday, 10 June, 14:00 - 14:45
Presentation Time:Thursday, 10 June, 14:00 - 14:45
Presentation Poster
Topic Special Sessions: On-device AI for Audio and Speech Applications
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract With the development of deep learning, end-to-end neural text-to-speech (TTS) systems have achieved significant improvements in high-quality speech synthesis. However, most of these systems are attention-based autoregressive models, resulting in slow synthesis speed and large model parameters. In addition, speech in different languages is usually synthesized using different models, which increases the complexity of the speech synthesis system. In this paper, we propose a new lightweight multi-speaker multi-lingual speech synthesis system, named LightTTS, which can quickly synthesize the Chinese, English or code-switch speech of multiple speakers in a non-autoregressive generation manner using only one model. Moreover, compared to FastSpeech with the same number of neural network layers and nodes, our LightTTS achieves a 2.50x Mel-spectrum generation acceleration on CPU, and the parameters are compressed by 12.83x.