Paper ID | SPE-56.2 |
Paper Title |
AN ATTENTION MODEL FOR HYPERNASALITY PREDICTION IN CHILDREN WITH CLEFT PALATE |
Authors |
Vikram C Mathad, Nancy Scherer, Arizona State University, United States; Kathy Chapman, University of Utah, United States; Julie Liss, Visar Berisha, Arizona State University, United States |
Session | SPE-56: Paralinguistics in Speech |
Location | Gather.Town |
Session Time: | Friday, 11 June, 14:00 - 14:45 |
Presentation Time: | Friday, 11 June, 14:00 - 14:45 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-ANLS] Speech Analysis |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Hypernasality refers to the perception of abnormal nasal resonances in vowels and voiced consonants. Estimation of hypernasality severity from connected speech samples involves learning a mapping between the frame-level features and utterance-level clinical ratings of hypernasality. However, not all speech frames contribute equally to the perception of hypernasality. In this work, we propose an attention-based bidirectional long-short memory (BLSTM) model that directly maps the frame-level features to utterance-level ratings by focusing only on specific speech frames carrying hypernasal cues. The model’s performance is evaluated on the Americleft database containing speech samples of children with cleft palate and clinical ratings of hypernasality. We analyzed the attention weights over broad phonetic categories and found that the model yields results consistent with what is known in the speech science literature. Further, the correlation between the predicted and perceptual rating is found to be significant (r=0.684, p < 0.001) and better than conventional BLSTMs trained using frame-wise and last-frame approaches. |