Paper ID | AUD-29.6 |
Paper Title |
Supervised direct-path relative transfer function learning for binaural sound source localization |
Authors |
Bing Yang, Key Laboratory of Machine Perception, Shenzhen Graduate School, Peking University; Westlake University & Westlake Institute for Advanced Study, China; Xiaofei Li, Westlake University & Westlake Institute for Advanced Study, China; Hong Liu, Key Laboratory of Machine Perception, Shenzhen Graduate School, Peking University, China |
Session | AUD-29: Acoustic Sensor Array Processing 3: Acoustic Sensor Arrays |
Location | Gather.Town |
Session Time: | Friday, 11 June, 11:30 - 12:15 |
Presentation Time: | Friday, 11 June, 11:30 - 12:15 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-ASAP] Acoustic Sensor Array Processing |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two channels. Though DP-RTF fully encodes the sound directional cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes a supervised DP-RTF learning method with deep neural networks for robust binaural sound source localization. To exploit the complementarity of single-channel spectrogram and dual-channel difference information, we first recover the direct-path magnitude spectrogram from the contaminated one using a monaural enhancement network, and then predict the DP-RTF from the dual-channel (enhanced-) intensity and phase cues using a binaural enhancement network. In addition, a weighted-matching softmax training loss is designed to promote the predicted DP-RTFs to be concentrated for the same direction and separated for different directions. Finally, the direction of arrival (DOA) of source is estimated by matching the predicted DP-RTF with the ground truths of candidate directions. Experimental results show the superiority of our method for DOA estimation in the environments with various levels of noise and reverberation. |