| Presentation # | 3 |
| Session: | ASR IV |
| Session Time: | Friday, December 21, 13:30 - 15:30 |
| Presentation Time: | Friday, December 21, 13:30 - 15:30 |
| Presentation: |
Poster
|
| Topic: |
Speech recognition and synthesis: |
| Paper Title: |
Exploring layer trajectory LSTM with depth processing units and attention |
| Authors: |
Jinyu Li; Microsoft | | |
| | Liang Lu; Microsoft | | |
| | Changliang Liu; Microsoft | | |
| | Yifan Gong; Microsoft | | |
| Abstract: |
Traditional LSTM model and its variants normally work in a frame-by-frame and layer-by-layer fashion, which deals with the temporal modeling and target classification problems at the same time. In this paper, we extend our recently proposed layer trajectory LSTM (ltLSTM) and present a generalized framework, which is equipped with a depth processing block that scans the hidden states of each time-LSTM layer, and uses the summarized layer trajectory information for final senone classification. We explore different modeling units used in the depth processing block to have a good tradeoff between accuracy and runtime cost. Furthermore, we integrate an attention module into this framework to explore wide context information, which is especially beneficial for uni-directional LSTMs. Trained with 30 thousand hours of EN-US Microsoft internal data and cross entropy criterion, the proposed generalized ltLSTM performed significantly better than the standard multi-layer time-LSTM, with up to 12.8% relative word error rate (WER) reduction across different tasks. With attention modeling, the relative WER reduction can be up to 17.9%. We observed similar gain when the models were trained with sequence discriminative training criterion. |