Technical Program

Paper Detail

Presentation #	20
Session:	ASR IV
Location:	Kallirhoe Hall
Session Time:	Friday, December 21, 13:30 - 15:30
Presentation Time:	Friday, December 21, 13:30 - 15:30
Presentation:	Poster
Topic:	Speech recognition and synthesis:
Paper Title:	FAR-FIELD ASR USING LOW-RANK AND SPARSE SOFT TARGETS FROM PARALLEL DATA
Authors:	Pranay Dighe, Idiap Research Institute, EPFL, Switzerland; Afsaneh Asaei, Idiap Research Institute, Switzerland; Herve Bourlard, Idiap Research Institute, EPFL, Switzerland
Abstract:	Far-field automatic speech recognition (ASR) of conversational speech is often considered to be a very challenging task due to poor quality of alignments available for training the DNN acoustic models. A common way to alleviate this problem is to use clean alignments obtained from parallelly recorded close-talk speech. In this work, we advance the parallel data approach by obtaining enhanced low-rank and sparse soft targets from a close-talk ASR system and using them for training more accurate far-field acoustic models. Specifically, we exploit \textit{eigenposteriors} and \textit{Compressive Sensing} dictionaries to learn low-dimensional senone subspaces in DNN posterior space, and enhance close-talk DNN posteriors to obtain high quality soft targets. Enhanced soft targets encode the structural and temporal inter-relationships among senone classes which are easily accessible in the DNN posterior space of close-talk speech but not in its noisy far-field counterpart. We exploit enhanced soft targets to improve the mapping of far-field acoustics to close-talk senone classes. Experiments are performed on AMI corpus where our approach improves DNN acoustic modeling by 4.4\% absolute reduction in WER as compared to a system which doesn't use parallel data. Finally, the approach is also validated on state-of-the-art recurrent and time delay neural network architectures.