Paper ID | HLT-16.5 | ||
Paper Title | Joint Alignment Learning-Attention based Model for Grapheme-to-Phoneme Conversion | ||
Authors | Yonghe Wang, Feilong Bao, Hui Zhang, Guanglai Gao, Inner Mongolia University, China | ||
Session | HLT-16: Applications in Natural Language | ||
Location | Gather.Town | ||
Session Time: | Thursday, 10 June, 16:30 - 17:15 | ||
Presentation Time: | Thursday, 10 June, 16:30 - 17:15 | ||
Presentation | Poster | ||
Topic | Speech Processing: [SPE-GASR] General Topics in Speech Recognition | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Sequence-to-sequence attention-based models for grapheme-to-phoneme (G2P) conversion have gained significant interests. The attention-based encoder-decoder framework learns the mapping of input to output tokens by selectively focusing on relevant information, and has been shown well performance. However, the attention mechanism can result in non-monotonic alignments, resulting in poor G2P conversion performance. In this paper, we present a novel approach to optimize the G2P conversion model directly alignment grapheme-phoneme sequence by using alignment learning (AL) as the loss function. Besides, we propose a multi-task learning method that uses a joint alignment learning model and attention model to predict the proper alignments and thus improve the accuracy of G2P conversion. Evaluations on Mongolian and CMUDict tasks show that alignment learning as the loss function can effectively train G2P conversion model. Further, our multi-task method can significantly outperform both the alignment learning-based model and attention-based model. |