Login Paper Search My Schedule Paper Index Help

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #7
Session:ASR II
Session Time:Thursday, December 20, 13:30 - 15:30
Presentation Time:Thursday, December 20, 13:30 - 15:30
Presentation: Poster
Topic: Multimodal processing:
Paper Title: LSTM LANGUAGE MODEL ADAPTATION WITH IMAGES AND TITLES FOR MULTIMEDIA AUTOMATIC SPEECH RECOGNITION
Authors: Yasufumi Moriya; Dublin City University 
 Gareth Jones; Dublin City University 
Abstract: Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) task. The incorporation of visual features as additional contextual information as a means to improve ASR for this data has recently drawn attention from researchers. Our investigation extends existing work by using images and video titles to adapt a recurrent neural network (RNN) language model with long-short term memory (LSTM). Our language model is tested on an existing corpus of instruction videos and on a new corpus consisting of lecture videos. Consistent reduction in perplexity by 5-10 was observed on both datasets. When the non-adapted model was combined with the image adaptation and video title adaptation models for n-best hypotheses re-ranking, word error rate (WER) is decreased by around 0.5% on the both datasets. By analysing output word probabilities of the model, it was found that both image adaptation and video title adaptation give the model more confidence in choice of contextually correct words.