Technical Program

Paper Detail

Presentation #	8
Session:	ASR III (End-to-End)
Location:	Kallirhoe Hall
Session Time:	Friday, December 21, 10:00 - 12:00
Presentation Time:	Friday, December 21, 10:00 - 12:00
Presentation:	Poster
Topic:	Speech recognition and synthesis:
Paper Title:	COMBINING DE-NOISING AUTO-ENCODER AND RECURRENT NEURAL NETWORKS IN END-TO-END AUTOMATIC SPEECH RECOGNITION FOR NOISE ROBUSTNESS
Authors:	Tzu-Hsuan Ting, Chia-Ping Chen, National Sun Yat-sen University, Taiwan
Abstract:	In this paper, we propose an end-to-end noise-robust automatic speech recognition system through deep-learning implementation of de-noising auto-encoders and recurrent neural networks. We use batch normalization and a novel design for the front-end de-noising auto-encoder, which mimics a two-stage prediction of a single-frame clean feature vector from multi-frame noisy feature vectors. For the back-end word recognition, we use an end-to-end system based on bidirectional recurrent neural network with long short-term memory cells. The LSTM-BiRNN is trained via connectionist temporal classification criterion. Its performance is compared to a baseline backend based on hidden Markov models and Gaussian mixture models (HMM-GMM). Our experimental results show that the proposed novel front-end de-noising auto-encoder outperforms the best record we can find for the Aurora 2.0 clean-condition training tasks by an absolute improvement of 1.2% (6.0% vs. 7.2%). In addition, the proposed end-to-end back-end architecture is as good as the traditional HMM-GMM back-end recognizer.