SPE-62.5
OPTIMIZING ALIGNMENT OF SPEECH AND LANGUAGE LATENT SPACES FOR END-TO-END SPEECH RECOGNITION AND UNDERSTANDING
Wei Wang, Yanmin Qian, Shanghai Jiao Tong University, China; Shuo Ren, Yao Qian, Shujie Liu, Yu Shi, Michael Zeng, Microsoft Corporation, China
Session:
Speech Recognition: Training Methods for e2e ASR
Track:
Speech and Language Processing
Location:
Gather Area C
Presentation Time:
Wed, 11 May, 23:00 - 23:45 China Time (UTC +8)
Wed, 11 May, 15:00 - 15:45 UTC
Wed, 11 May, 15:00 - 15:45 UTC
Session Chair:
Jasha Droppo, Amazon
Session SPE-62
SPE-62.1: CONSISTENT TRAINING AND DECODING FOR END-TO-END SPEECH RECOGNITION USING LATTICE-FREE MMI
Jinchuan Tian, Yuexian Zou, Peking University, China; Jianwei Yu, Chao Weng, Shi-Xiong Zhang, Dan Su, Dong Yu, Tencent, China
SPE-62.2: BEING GREEDY DOES NOT HURT: SAMPLING STRATEGIES FOR END-TO-END SPEECH RECOGNITION
Jahn Heymann, Egor Lakomkin, Leif Raedel, Amazon Inc., Germany
SPE-62.3: INVESTIGATING SEQUENCE-LEVEL NORMALISATION FOR CTC-LIKE END-TO-END ASR
Zeyu Zhao, Peter Bell, Centre for Speech Technology Research, University of Edinburgh, UK, United Kingdom of Great Britain and Northern Ireland
SPE-62.4: HIERARCHICAL CONDITIONAL END-TO-END ASR WITH CTC AND MULTI-GRANULAR SUBWORD UNITS
Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi, Waseda University, Japan
SPE-62.5: OPTIMIZING ALIGNMENT OF SPEECH AND LANGUAGE LATENT SPACES FOR END-TO-END SPEECH RECOGNITION AND UNDERSTANDING
Wei Wang, Yanmin Qian, Shanghai Jiao Tong University, China; Shuo Ren, Yao Qian, Shujie Liu, Yu Shi, Michael Zeng, Microsoft Corporation, China
SPE-62.6: MINIMUM WORD ERROR TRAINING FOR NON-AUTOREGRESSIVE TRANSFORMER-BASED CODE-SWITCHING ASR
Yizhou Peng, Jicheng Zhang, Hao Huang, Xinjiang University, China; Haihua Xu, Bytedance, Singapore; Eng Siong Chng, Nanyang Technological University, Singapore