Technical Program

Paper Detail

Presentation #	3
Session:	ASR IV
Location:	Kallirhoe Hall
Session Time:	Friday, December 21, 13:30 - 15:30
Presentation Time:	Friday, December 21, 13:30 - 15:30
Presentation:	Poster
Topic:	Speech recognition and synthesis:
Paper Title:	Exploring layer trajectory LSTM with depth processing units and attention
Authors:	Jinyu Li, Liang Lu, Changliang Liu, Yifan Gong, Microsoft, United States
Abstract:	Traditional LSTM model and its variants normally work in a frame-by-frame and layer-by-layer fashion, which deals with the temporal modeling and target classification problems at the same time. In this paper, we extend our recently proposed layer trajectory LSTM (ltLSTM) and present a generalized framework, which is equipped with a depth processing block that scans the hidden states of each time-LSTM layer, and uses the summarized layer trajectory information for final senone classification. We explore different modeling units used in the depth processing block to have a good tradeoff between accuracy and runtime cost. Furthermore, we integrate an attention module into this framework to explore wide context information, which is especially beneficial for uni-directional LSTMs. Trained with 30 thousand hours of EN-US Microsoft internal data and cross entropy criterion, the proposed generalized ltLSTM performed significantly better than the standard multi-layer time-LSTM, with up to 12.8% relative word error rate (WER) reduction across different tasks. With attention modeling, the relative WER reduction can be up to 17.9%. We observed similar gain when the models were trained with sequence discriminative training criterion.