Presentation # | 20 |
Session: | ASR IV |
Location: | Kallirhoe Hall |
Session Time: | Friday, December 21, 13:30 - 15:30 |
Presentation Time: | Friday, December 21, 13:30 - 15:30 |
Presentation: |
Poster
|
Topic: |
Speech recognition and synthesis: |
Paper Title: |
FAR-FIELD ASR USING LOW-RANK AND SPARSE SOFT TARGETS FROM PARALLEL DATA |
Authors: |
Pranay Dighe, Idiap Research Institute, EPFL, Switzerland; Afsaneh Asaei, Idiap Research Institute, Switzerland; Herve Bourlard, Idiap Research Institute, EPFL, Switzerland |
Abstract: |
Far-field automatic speech recognition (ASR) of conversational speech is often considered to be a very challenging task due to poor quality of alignments available for training the DNN acoustic models. A common way to alleviate this problem is to use clean alignments obtained from parallelly recorded close-talk speech. In this work, we advance the parallel data approach by obtaining enhanced low-rank and sparse soft targets from a close-talk ASR system and using them for training more accurate far-field acoustic models. Specifically, we exploit \textit{eigenposteriors} and \textit{Compressive Sensing} dictionaries to learn low-dimensional senone subspaces in DNN posterior space, and enhance close-talk DNN posteriors to obtain high quality soft targets. Enhanced soft targets encode the structural and temporal inter-relationships among senone classes which are easily accessible in the DNN posterior space of close-talk speech but not in its noisy far-field counterpart. We exploit enhanced soft targets to improve the mapping of far-field acoustics to close-talk senone classes. Experiments are performed on AMI corpus where our approach improves DNN acoustic modeling by 4.4\% absolute reduction in WER as compared to a system which doesn't use parallel data. Finally, the approach is also validated on state-of-the-art recurrent and time delay neural network architectures. |