Technical Program

Paper Detail

Presentation #11
Session:Detection, Paralinguistics and Coding
Location:Kallirhoe Hall
Session Time:Wednesday, December 19, 13:30 - 15:30
Presentation Time:Wednesday, December 19, 13:30 - 15:30
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: WAVENET-BASED ZERO-DELAY LOSSLESS SPEECH CODING
Authors: Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda, Nagoya Institute of Technology, Japan
Abstract: This paper presents a WaveNet-based zero-delay lossless speech coding technique for high-quality communications. The WaveNet generative model, which is a state-of-the-art model for neural-network-based speech waveform synthesis, is used in both the encoder and decoder. In the encoder, discrete speech signals are losslessly compressed using sample-by-sample entropy coding. The decoder fully reconstructs the original speech signals from the compressed signals without algorithmic delay. Experimental results show that the proposed coding technique can transmit speech audio waveforms with 50% their original bit rate and the WaveNet-based speech coder remains effective for unknown speakers.