IEEE ICASSP 2022 || Singapore || 7-13 May 2022 Virtual; 22-27 May 2022 In-Person

IEP-2: Odyssey to Human-Level Automatic Speech Recognition: Are We There Yet?

Sun, 8 May, 21:00 - 21:45 China Time (UTC +8)
Sun, 8 May, 13:00 - 13:45 UTC

Location: Gather Area P

Virtual

Gather.Town

Expert

Presented by: Kyu Jeong Han, ASAPP

It was Stanley Kubrick that first pictured the aspiration of mankind to create artificial intelligences that can communicate with humans in a movie titled "2001: Space Odyssey", but even before this 1968 motion picture, we, human beings, had made incessant effort to develop human-like intellectual systems. The pursuit of human-level automatic speech recognition (ASR) technology, along the same line, has its own history that has stimulated a significant deal of technological advances throughout the journey. This Industry Expert talk reviews the recent history of the Odyssey by the speech signal processing/machine learning communities to achieve or even exceed the human parity in ASR systems, focusing on the breakthroughs made in the deep learning era in the context of Switchboard and LibriSpeech, the two most widely-adopted standard benchmark datasets. In addition, we discuss how the industry employs the knowledge obtained by the research communities coming along these breakthroughs, for example, the transfer learning paradigm of leveraging neural network models trained in an unsupervised way such as Wav2Vec or HuBERT, into practical settings in the wild where ASR services should meet diverse market demands in a scalable manner. We also suggest the ways to address in-practice requirements, like memory constraint on hand-held devices or latency for streaming use cases, in order to make ASR services dependable with human-level accuracy across various end-user scenarios.

The targeted audience of this Industry Expert talk is broad, ranging from graduate students starting their research careers in the speech field to senior industry practitioners who would want to apply the state-of-the-art speech recognition approaches to their own problem domains. The talk will provide the timely perspectives of the industry expert, who used to be a well-known research scientist in prestigious industry research labs including IBM Watson and now is a leading figure to drive technology developments in ASAPP's cutting-edge AI services in a customer experience (CX) domain, on the present landscapes and future directions of scalable ASR modeling and deployments, which will be well received by and inspire to many ICASSP attendees.

Biography

Dr. Kyu J. Han is a Sr. Director of Speech Modeling and ML Data Labeling at ASAPP, leading an applied science team primarily working on automatic speech recognition and voice analytics for ASAPP's machine learning services to its enterprise customers in a customer experience (CX) domain. He received his Ph.D. from the University of Southern California, USA in 2009 under Prof. Shrikanth Narayanan and has since held research positions at IBM Watson, Ford Research, Capio (acquired by Twilio), JD AI Research and ASAPP. At IBM he participated in the IARPA Biometrics Exploitation Science and Technology (BEST) project and the DARPA Robust Automatic Transcription of Speech (RATS) program. He led a research team at Capio where the team achieved the state-of-the-art performances in telephony speech recognition and successfully completed a government-funded project for noise robust, on-prem ASR system integration across 13 different languages. Dr. Kyu J. Han is actively involved in speech community activities, serving as reviewers for IEEE, ISCA and ACL journals and conferences. He was a member of the Speech and Language Processing Technical Committee (SLTC) of the IEEE Signal Processing Society from 2019-2021 where he served as the Chair of the Workshop Subcommittee. He also served as a committee member for the Organizing Committee of the IEEE SLT-2021. He was the Survey Talk speaker and the Doctoral Consortium panel at Interspeech 2019, and the Tutorial speaker at Interspeech 2020. In addition, he won the ISCA Award for the Best Paper Published in Computer Speech & Language 2013-2017. He is a Senior Member of the IEEE.

IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022

Virtual (all paper presentations)

22-27 May 2022

Main Venue: Marina Bay Sands Expo & Convention Center, Singapore

27-28 October 2022

Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022 Virtual (all paper presentations) 22-27 May 2022 Main Venue: Marina Bay Sands Expo & Convention Center, Singapore 27-28 October 2022 Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

7-13 May 2022

Virtual (all paper presentations)

22-27 May 2022

Main Venue: Marina Bay Sands Expo & Convention Center, Singapore

27-28 October 2022

Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China