Paper ID | SPE-35.5 |
Paper Title |
A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT |
Authors |
Tyler Vuong, Yangyang Xia, Richard Stern, Carnegie Mellon University, United States |
Session | SPE-35: Speech Enhancement 5: DNS Challenge Task |
Location | Gather.Town |
Session Time: | Thursday, 10 June, 14:00 - 14:45 |
Presentation Time: | Thursday, 10 June, 14:00 - 14:45 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-ENHA] Speech Enhancement and Separation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
We describe a modulation-domain loss function for deep-learning-based speech enhancement systems. Learnable spectro-temporal receptive fields (STRFs) were adapted to optimize for a speaker identification task. The learned STRFs were then used to calculate a weighted mean-squared error (MSE) in the modulation domain for training a speech enhancement system. Experiments showed that adding the modulation-domain MSE to the MSE in the spectro-temporal domain substantially improved the objective prediction of speech quality and intelligibility for real-time speech enhancement systems without incurring additional computation during inference. |