Presentation # | 11 |
Session: | Detection, Paralinguistics and Coding |
Location: | Kallirhoe Hall |
Session Time: | Wednesday, December 19, 13:30 - 15:30 |
Presentation Time: | Wednesday, December 19, 13:30 - 15:30 |
Presentation: |
Poster
|
Topic: |
Speech recognition and synthesis: |
Paper Title: |
WAVENET-BASED ZERO-DELAY LOSSLESS SPEECH CODING |
Authors: |
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda, Nagoya Institute of Technology, Japan |
Abstract: |
This paper presents a WaveNet-based zero-delay lossless speech coding technique for high-quality communications. The WaveNet generative model, which is a state-of-the-art model for neural-network-based speech waveform synthesis, is used in both the encoder and decoder. In the encoder, discrete speech signals are losslessly compressed using sample-by-sample entropy coding. The decoder fully reconstructs the original speech signals from the compressed signals without algorithmic delay. Experimental results show that the proposed coding technique can transmit speech audio waveforms with 50% their original bit rate and the WaveNet-based speech coder remains effective for unknown speakers. |