Paper ID | AUD-14.4 |
Paper Title |
NON-INTRUSIVE BINAURAL PREDICTION OF SPEECH INTELLIGIBILITY BASED ON PHONEME CLASSIFICATION |
Authors |
Jana Roßbach, Communication Acoustics and Cluster of Excellence Hearing4All, Carl-von-Ossietzky University Oldenburg, Germany; Saskia Röttges, Christopher F. Hauth, Thomas Brand, Medical Physics and Cluster of Excellence Hearing4All, Carl-von-Ossietzky University Oldenburg, Germany; Bernd T. Meyer, Communication Acoustics and Cluster of Excellence Hearing4All, Carl-von-Ossietzky University Oldenburg, Germany |
Session | AUD-14: Quality and Intelligibility Measures |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 15:30 - 16:15 |
Presentation Time: | Wednesday, 09 June, 15:30 - 16:15 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-QIM] Quality and Intelligibility Measures |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
In this study, we explore an approach for modeling speech intelligibility in spatial acoustic scenes. To this end, we combine a non-intrusive binaural frontend with a deep neural network (DNN) borrowed from a standard automatic speech recognition (ASR) system. The DNN estimates phoneme probabilities that degrade in the presence of noise and reverberation, which is quantified with an entropy-based measure. The model output is used to predict speech recognition thresholds, i.e., signal-to-noise ratio with 50\% word recognition accuracy. It is compared to measured data obtained from eight normal-hearing listeners in acoustic scenarios with varying positions of localized maskers, different rooms and reverberation times. The model is non-intrusive; yet it produces a root mean squared error in the range of 0.6-2.1\,dB, which is similar to results obtained with a reference model (0.3-1.8\,dB) that uses oracle knowledge both in the frontend and in the backend stage. |