2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Technical Program

Paper ID	SPE-35.5
Paper Title	A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
Authors	Tyler Vuong, Yangyang Xia, Richard Stern, Carnegie Mellon University, United States
Session	SPE-35: Speech Enhancement 5: DNS Challenge Task
Location	Gather.Town
Session Time:	Thursday, 10 June, 14:00 - 14:45
Presentation Time:	Thursday, 10 June, 14:00 - 14:45
Presentation	Poster
Topic	Speech Processing: [SPE-ENHA] Speech Enhancement and Separation
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	We describe a modulation-domain loss function for deep-learning-based speech enhancement systems. Learnable spectro-temporal receptive fields (STRFs) were adapted to optimize for a speaker identification task. The learned STRFs were then used to calculate a weighted mean-squared error (MSE) in the modulation domain for training a speech enhancement system. Experiments showed that adding the modulation-domain MSE to the MSE in the spectro-temporal domain substantially improved the objective prediction of speech quality and intelligibility for real-time speech enhancement systems without incurring additional computation during inference.