Login Paper Search My Schedule Paper Index Help

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #5
Session:ASR I
Session Time:Wednesday, December 19, 10:00 - 12:00
Presentation Time:Wednesday, December 19, 10:00 - 12:00
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: ADVANCING MULTI-ACCENTED LSTM-CTC SPEECH RECOGNITION USING A DOMAIN SPECIFIC STUDENT-TEACHER LEARNING PARADIGM
Authors: Shahram Ghorbani; University of Texas at Dallas 
 Ahmet E. Bulut; University of Texas at Dallas 
 John H.L. Hansen; University of Texas at Dallas 
Abstract: Non-native speech causes automatic speech recognition systems to degrade in performance. Past strategies to address this challenge have considered model adaptation, accent classification with a model selection, alternate pronunciation lexicon, etc. In this study, we consider a recurrent neural network (RNN) with connectionist temporal classification (CTC) cost function trained on multi-accent English data including US (Native), Indian and Hispanic accents. We exploit dark knowledge from a model trained with the multi-accent data to train student models under the guidance of both a teacher model and CTC cost of target transcription. Transferring knowledge from a single RNN-CTC trained model toward a student model, yields better performance than the stand-alone teacher model. Since the outputs of different trained CTC models are not necessarily aligned, it is not possible to simply use an ensemble of CTC teacher models. To address this problem, we train accent specific models under the guidance of a single multi-accent teacher, which results in having multiple aligned and trained CTC models. Furthermore, we train a student model under the supervision of the accent-specific teachers, resulting in an even further complementary model, which achieves +20.1% relative Character Error Rate (CER) reduction compared to the baseline trained without any teacher.