Paper ID | AUD-11.6 |
Paper Title |
AN END-TO-END NON-INTRUSIVE MODEL FOR SUBJECTIVE AND OBJECTIVE REAL-WORLD SPEECH ASSESSMENT USING A MULTI-TASK FRAMEWORK |
Authors |
Zhuohuang Zhang, Piyush Vyas, Xuan Dong, Donald S. Williamson, Indiana University, United States |
Session | AUD-11: Auditory Modeling and Hearing Instruments |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 14:00 - 14:45 |
Presentation Time: | Wednesday, 09 June, 14:00 - 14:45 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-QIM] Quality and Intelligibility Measures |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Speech assessment is crucial for many applications, but current intrusive methods cannot be used in real environments. Data-driven approaches have been proposed, but they use simulated speech materials or only estimate objective scores. In this paper, we propose a novel multi-task non-intrusive approach that is capable of simultaneously estimating both subjective and objective scores of real-world speech, to help facilitate learning. This approach enhances our prior work, which estimated subjective mean-opinion scores, where our approach now operates directly on the time-domain signal in an end-to-end fashion. The proposed system is compared against several state-of-the-art systems. The experimental results show that our multi-task and end-to-end framework leads to higher correlation performance and lower prediction errors, according to multiple evaluation measures. |