Technical Program

Paper Detail

Presentation #16
Session:ASR IV
Location:Kallirhoe Hall
Session Time:Friday, December 21, 13:30 - 15:30
Presentation Time:Friday, December 21, 13:30 - 15:30
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: AN EXPLORATION OF MIMIC ARCHITECTURES FOR RESIDUAL NETWORK BASED SPECTRAL MAPPING
Authors: Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier, The Ohio State University, United States
Abstract: Spectral mapping uses a deep neural network (DNN) to map directly from noisy speech to clean speech. Our previous study found that the performance of spectral mapping improves greatly when using helpful cues from an acoustic model trained on clean speech. The mapper network learns to mimic the input favored by the spectral classifier and cleans the features accordingly. In this study, we explore two new innovations: we replace a DNN-based spectral mapper with a residual network that is more attuned to the goal of predicting clean speech. We also examine how integrating long term context in the mimic criterion (via wide-residual biLSTM networks) affects the performance of spectral mapping compared to DNNs. Our goal is to derive a model that can be used as a preprocessor for any recognition system; the features derived from our model are passed through the standard Kaldi ASR pipeline and achieve a WER of 9.3%, which is the lowest recorded word error rate for CHiME-2 dataset using only feature adaptation.