IEEE ICASSP 2022 || Singapore || 7-13 May 2022 Virtual; 22-27 May 2022 In-Person

WS-2: The new era of all-neural SLU: opportunities and challenges ahead

Mon, 23 May, 14:00 - 17:30 China Time (UTC +8)
Mon, 23 May, 06:00 - 09:30 UTC

Location: Roselle Junior Ballroom 4612-3

In-Person

Live-Stream

Workshop

Summary

Speech recognition technology is completing a dramatic change — the move to an all-neural architecture replacing the conventional stack of independently trained neural and non-neural subsystems. The neural architecture improves accuracy over a wide range of use cases, challenges the boundary between speech recognition and language understanding allowing for jointly trained models, enables multi-task learning simultaneously solving transcription, segmentation, confidence estimation, and potentially more tasks. The neural architecture also achieves superior memory and compute compression, enabling streaming low-latency speech recognition at the edge, where resources are constrained. When applied as end-to-end all-neural SLU (ASR + NLU), the tradeoff between compression vs accuracy is even more favorable. The neural architecture enables truly multi-lingual systems that support within-sentence code switching. The neural architecture helps to reduce reliance on human labeling thanks to unsupervised pre-training, teacher/student semi-supervised training, and the ability to learn to incorporate user feedback signals, and to learn from other modalities.

While the neural architecture has shown great results and provides leeway for significant future improvements, it also presents new challenges. Personalization and adaptation are much easier to do in the conventional factored stack by adapting the finite state language models, a property that is lost with end-to-end all-neural models. Making adaptation effective and practical for all-neural systems remains a challenge, one that requires focused innovation and investment on building new sophisticated neural architecture solutions. Rare-word modeling is a challenge for neural architectures which learn acoustics and language jointly from audio/text pairs, whereas conventional architectures can use much larger text-only data sets for training the language models.

In this workshop, we will provide an overview of the all-neural architecture developed by the Alexa ASR group, dive deep into some of the challenges and future opportunities, and conduct a panel discussion and Q&A session on the impact, and the future of the all-neural approach to speech recognition.

Workshop Co-chairs

Jennifer Shumway
Ariya Rastrow
Björn Hoffmeister
Chris Ho

Panel Members

Ariya Rastrow (Sr Principal Scientist)
Andreas Stolcke (Sr Principal Scientist)
Shalini Ghosh (Principal Scientist)
Björn Hoffmeister (Director of Science)

Main Presentations

The new area of All-Neural ASR: An Overview
Presenters: Björn Hoffmeister, Ariya Rastrow
Pre-Training and Multi-Modal Training
Presenter: Shalini Ghosh

Deep Dive Presentations

RescoreBERT: Discriminative speech recognition rescoring with BERT
Presenter: Yi Gu
Bi/Multilingual ASR and LID using RNN-T
Presenter: Harish Arsikere
Multi-turn RNN-T for streaming recognition of multi-party speech
Presenters: Anna Piunova and Ilya Sklyar
Being greedy does not hurt: Sampling strategies for end-to-end speech recognition
Presenter: Jahn Heymann
Lattice-attention in ASR rescoring
Presenters: Prabhat Pandey and Sergio Duarte Torres
Multi-task RNN-T with semantic decoder for streamable spoken language understanding
Presenter: Feng-Ju (Claire) Chang

IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022

Virtual (all paper presentations)

22-27 May 2022

Main Venue: Marina Bay Sands Expo & Convention Center, Singapore

27-28 October 2022

Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

Summary

Workshop Co-chairs

Panel Members

Main Presentations

Deep Dive Presentations

IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022 Virtual (all paper presentations) 22-27 May 2022 Main Venue: Marina Bay Sands Expo & Convention Center, Singapore 27-28 October 2022 Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

Summary

Workshop Co-chairs

Panel Members

Main Presentations

Deep Dive Presentations

7-13 May 2022

Virtual (all paper presentations)

22-27 May 2022

Main Venue: Marina Bay Sands Expo & Convention Center, Singapore

27-28 October 2022

Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China