2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-10.6
Paper Title HISTORY UTTERANCE EMBEDDING TRANSFORMER LM FOR SPEECH RECOGNITION
Authors Keqi Deng, Gaofeng Cheng, Haoran Miao, Pengyuan Zhang, Yonghong Yan, Institute of Acoustics, Chinese Academy of Sciences, China
SessionSPE-10: Speech Recognition 4: Transformer Models 2
LocationGather.Town
Session Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Poster
Topic Speech Processing: [SPE-LVCR] Large Vocabulary Continuous Recognition/Search
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract History utterances contain rich contextual information; however, better extracting information from the history utterances and using it to improve the language model (LM) is still challenging. In this paper, we propose the history utterance embedding Transformer LM (HTLM), which includes an embedding generation network for extracting contextual information contained in the history utterances and a main Transformer LM for current prediction. In addition, the two-stage attention (TSA) is proposed to encode richer contextual information into the embedding of history utterances (h-emb) while supporting GPU parallel training. Furthermore, we combine the extracted h-emb and embedding of current utterance (c-emb) through the dot-product attention and a fusion method for HTLM's current prediction. Experiments are conducted on the HKUST dataset and achieve a 23.4% character error rate (CER) on the test set. Compared with the baseline, the proposed method yields 12.86 absolute perplexity reduction and 0.8% absolute CER reduction.