SLT 2018 • Technical Program • 2018 IEEE Workshop on Spoken Language Technology (SLT) | 18-21 December 2018

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #

Session:

ASR I

Session Time:

Wednesday, December 19, 10:00 - 12:00

Presentation Time:

Wednesday, December 19, 10:00 - 12:00

Presentation:

Poster

Topic:

Speech recognition and synthesis:

Paper Title:

Learning Noise-Invariant Representations for Robust Speech Recognition

Authors:

Davis Liang; Amazon AI

Zhiheng Huang; Amazon AI

Zachary Lipton; Carnegie Mellon University

Abstract:

Despite rapid advances in speech recognition, current models remain brittle to superficial perturbations to their inputs. Small amounts of noise can destroy the performance of an otherwise state-of-the-art model. To harden models against background noise, practitioners often perform data augmentation, adding artificially-noised examples to the training set, carrying over the original label. In this paper, we hypothesize that a clean example and its superficially perturbed counterparts shouldn't merely map to the same class--- they should map to the same representation. We propose invariant-representation-learning (IRL): At each training iteration, for each training example, we sample a noisy counterpart. We then apply a penalty term to coerce matched representations at each layer (above some chosen layer). Our key results, demonstrated on the Librispeech dataset are the following: (i) IRL significantly reduces character error rates (CER) on both 'clean' (3.3% vs 6.5%) and 'other' (11.0% vs 18.1%) test sets; (ii) on several out-of-domain noise settings (different from those seen during training) IRL's benefits are even more pronounced. Careful ablations confirm that our results are not simply due to shrinking activations at the chosen layers.