IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
27-28 October 2022
  • Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022
SPE-40.4

TIE YOUR EMBEDDINGS DOWN: CROSS-MODAL LATENT SPACES FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING

Bhuvan Agrawal, Markus Müller, Samridhi Choudhary, Martin Radfar, Athanasios Mouchtaris, Ross McGowan, Nathan Susanj, Siegfried Kunzmann, Amazon, United States of America

Session:
Language Understanding I: End-to-end Framework

Track:
Speech and Language Processing

Location:
Gather Area E

Presentation Time:
Tue, 10 May, 21:00 - 21:45 China Time (UTC +8)
Tue, 10 May, 13:00 - 13:45 UTC

Session Chair:
Junlan Feng, China Mobile Research
Presentation
Discussion
Resources
Session SPE-40
SPE-40.1: LEVERAGING BILINEAR ATTENTION TO IMPROVE SPOKEN LANGUAGE UNDERSTANDING
Dongsheng Chen, Zhiqi Huang, Yuexian Zou, Peking University, China
SPE-40.2: BUILDING ROBUST SPOKEN LANGUAGE UNDERSTANDING BY CROSS ATTENTION BETWEEN PHONEME SEQUENCE AND ASR HYPOTHESIS
Zexun Wang, Yuming Zhao, Mingchao Feng, Meng Chen, Xiaodong He, JD AI, China; Yuquan Le, Hunan University, China; Yi Zhu, University of Cambridge, United Kingdom of Great Britain and Northern Ireland
SPE-40.3: INTEGRATION OF PRE-TRAINED NETWORKS WITH CONTINUOUS TOKEN INTERFACE FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Seunghyun Seo, Donghyun Kwak, Naver Corporation, Korea, Republic of; Bowon Lee, Inha University, Korea, Republic of
SPE-40.4: TIE YOUR EMBEDDINGS DOWN: CROSS-MODAL LATENT SPACES FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Bhuvan Agrawal, Markus Müller, Samridhi Choudhary, Martin Radfar, Athanasios Mouchtaris, Ross McGowan, Nathan Susanj, Siegfried Kunzmann, Amazon, United States of America
SPE-40.5: IMPROVING END-TO-END MODELS FOR SET PREDICTION IN SPOKEN LANGUAGE UNDERSTANDING
Hong-Kwang Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon, IBM Research AI, United States of America
SPE-40.6: ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Yushi Ueda, Yifan Peng, Sujay Kumar, Karthik Ganesan, Brian Yan, Alan W Black, Shinji Watanabe, Carnegie Mellon University, United States of America; Pavel Denisov, Ngoc Thang Vu, University of Stuttgart, Germany; Yuekai Zhang, Zoom Video Communications, China