SLT 2018 • Technical Program • 2018 IEEE Workshop on Spoken Language Technology (SLT) | 18-21 December 2018

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #

Session:

ASR IV

Session Time:

Friday, December 21, 13:30 - 15:30

Presentation Time:

Friday, December 21, 13:30 - 15:30

Presentation:

Poster

Topic:

Speech recognition and synthesis:

Paper Title:

Multilingual sequence-to-sequence speech recognition: Architecture, transfer learning, and language modeling

Authors:

Jaejin Cho; Johns Hopkins University

Murali Karthick Baskar; Brno university of technology

Ruizhi Li; Johns Hopkins University

Matthew Wiesner; Johns Hopkins University

Sri Harish Mallidi; Amazon

Nelson Yalta; Waseda University

Martin Karafiat; Brno university of technology

Shinji Watanabe; Johns Hopkins University

Takaaki Hori; Mitsubishi Electric Research Laboratories

Abstract:

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multilingual seq2seq model as an initial model, and then perform several transfer learning approaches across 4 other BABEL languages. We also explore different architectures for a multilingual seq2seq model to improve their performance. Further analysis is performed to understand the importance of scheduled sampling approach to bring the model distribution closer to the target data distribution. The paper also discusses about the effect of combining a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the multilingual transfer learning model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant gains, and achieves recognition performance comparable to the models trained with twice more training data.