Paper ID | MLSP-46.1 |
Paper Title |
SAPAUGMENT: LEARNING A SAMPLE ADAPTIVE POLICY FOR DATA AUGMENTATION |
Authors |
Ting-Yao Hu, Carnegie Mellon University, United States; Ashish Shrivastava, Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel, Apple, United States |
Session | MLSP-46: Theory and Applications |
Location | Gather.Town |
Session Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Machine Learning for Signal Processing: [MLR-MUSAP] Applications in music and audio processing |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to learn the model parameters and should be perturbed with mild or no augmentation. Perturbing a hard sample with a strong augmentation may also corrupt the annotation and make it too hard to learn from. Furthermore, a well classified sample (with low training loss) should be perturbed by a stronger augmentation to provide more robustness to a variety of conditions. To formalize these intuitions, we propose a novel method to learn a Sample-Adaptive Policy for Augmentation -- SapAugment. Our policy adapts the augmentation parameters based on the training loss of the data samples. Furthermore, the proposed method combines multiple augmentation methods into a methodical policy learning framework and obviates hand-crafting augmentation parameters by trial-and-error. We apply our method on an automatic speech recognition (ASR) task and show substantial improvement, 21% relative reduction in word error rate on LibriSpeech dataset, over the state-of-the-art speech augmentation method. |