Presentation # | 9 |
Session: | ASR III (End-to-End) |
Location: | Kallirhoe Hall |
Session Time: | Friday, December 21, 10:00 - 12:00 |
Presentation Time: | Friday, December 21, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech recognition and synthesis: |
Paper Title: |
IMPROVED KNOWLEDGE DISTILLATION FROM BI-DIRECTIONAL TO UNI-DIRECTIONAL LSTM CTC FOR END-TO-END SPEECH RECOGNITION |
Authors: |
Gakuto Kurata, Kartik Audhkhasi, IBM Research, Japan |
Abstract: |
End-to-end automatic speech recognition (ASR) promises to simplify model training and deployment. Most end-to-end ASR systems utilize a bi-directional Long Short-Term Memory (BiLSTM) acoustic model due to its ability to capture acoustic context from the entire utterance. However, BiLSTM models have a high latency and cannot be used in streaming applications. Leveraging knowledge distillation to train a low-latency end-to-end uni-directional LSTM (UniLSTM) model from a BiLSTM model can be an option. However, it makes the strict assumption of shared frame-wise time alignments between the two models. We propose an improved knowledge distillation algorithm that relaxes this assumption and improves the accuracy of the UniLSTM model. We confirmed the advantage of the proposed method on a standard English conversational telephone speech recognition task. |