2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	AUD-29.6
Paper Title	Supervised direct-path relative transfer function learning for binaural sound source localization
Authors	Bing Yang, Key Laboratory of Machine Perception, Shenzhen Graduate School, Peking University; Westlake University & Westlake Institute for Advanced Study, China; Xiaofei Li, Westlake University & Westlake Institute for Advanced Study, China; Hong Liu, Key Laboratory of Machine Perception, Shenzhen Graduate School, Peking University, China
Session	AUD-29: Acoustic Sensor Array Processing 3: Acoustic Sensor Arrays
Location	Gather.Town
Session Time:	Friday, 11 June, 11:30 - 12:15
Presentation Time:	Friday, 11 June, 11:30 - 12:15
Presentation	Poster
Topic	Audio and Acoustic Signal Processing: [AUD-ASAP] Acoustic Sensor Array Processing
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two channels. Though DP-RTF fully encodes the sound directional cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes a supervised DP-RTF learning method with deep neural networks for robust binaural sound source localization. To exploit the complementarity of single-channel spectrogram and dual-channel difference information, we first recover the direct-path magnitude spectrogram from the contaminated one using a monaural enhancement network, and then predict the DP-RTF from the dual-channel (enhanced-) intensity and phase cues using a binaural enhancement network. In addition, a weighted-matching softmax training loss is designed to promote the predicted DP-RTFs to be concentrated for the same direction and separated for different directions. Finally, the direction of arrival (DOA) of source is estimated by matching the predicted DP-RTF with the ground truths of candidate directions. Experimental results show the superiority of our method for DOA estimation in the environments with various levels of noise and reverberation.