Paper ID | AUD-16.2 | ||
Paper Title | ON THE PREDICTABILITY OF HRTFS FROM EAR SHAPES USING DEEP NETWORKS | ||
Authors | Yaxuan Zhou, Hao Jiang, Vamsi Krishna Ithapu, Facebook Reality Labs, United States | ||
Session | AUD-16: Modeling, Analysis and Synthesis of Acoustic Environments 2: Spatial Audio | ||
Location | Gather.Town | ||
Session Time: | Wednesday, 09 June, 16:30 - 17:15 | ||
Presentation Time: | Wednesday, 09 June, 16:30 - 17:15 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-SARR] Spatial Audio Recording and Reproduction | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Head-Related Transfer Function (HRTF) individualization is critical for immersive and realistic spatial audio rendering in augmented/virtual reality. Neither measurements nor simulations using 3D scans of head/ear are scalable for practical applications. More efficient machine learning approaches are being explored recently, to predict HRTFs from ear images or anthropometric features. However, it is not yet clear whether such models can provide an alternative for direct measurements or high-fidelity simulations. Here, we aim to address this question. Using 3D ear shapes as inputs, we explore the bounds of HRTF predictability using deep neural networks. To that end, we propose and evaluate two models, and identify the lowest achievable spectral distance error when predicting the true HRTF magnitude spectra. |