Warning: Undefined variable $isLoggedIn in G:\WWWRoot\ICASSP2022\view_event.php on line 162
IEEE ICASSP 2022 || Singapore || 7-13 May 2022 Virtual; 22-27 May 2022 In-Person

IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
27-28 October 2022
  • Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022
ST-12: Applications for ASR Phrase Alternatives
Wed, 11 May, 23:00 - 23:45 China Time (UTC +8)
Wed, 11 May, 15:00 - 15:45 UTC
Location: Gather Area P
Virtual
Gather.Town
Show & Tell
Presented by: Arlo Faria, Mod9 Technologies Adam Janin, Mod9 Technologies Korbinian Riedhammer, Mod9 Technologies Sidhi Adkoli, Mod9 Technologies

Automatic Speech Recognition (ASR) systems have been rapidly improving in accuracy, yet they will inevitably continue to make some transcription errors. For downstream applications that involve direct presentation of ASR output to a user, it can be helpful if the system is able to represent alternatives in an efficient and intuitive manner. To this end, we have developed phrase alternatives: these are similar to traditional N-best lists or word-level alternatives, but have the significant advantages of being more compact and expressive. This Show and Tell demonstration will present three distinct applications of phrase alternatives. The first is a comparative evaluation of the famous Switchboard benchmark, in which the official NIST SCTK software is used to score the oracle Word Error Rate (WER) for various representations of alternatives, achieving nearly 0% WER. The second application integrates phrase alternatives in a Lucene / Elasticsearch text-based search indexing framework, enabling scalable high-recall audio search across a large collection of recordings. The third application is a novel transcript editing user interface, in which phrase alternatives enable an expert practitioner to apply manual corrections to ASR output while listening to audio playback at faster than real-time speed. Each of these applications uses the publicly available Mod9 ASR Engine, which loads any Kaldi-compatible models, and the software demonstrations can be presented with interactivity for both on-site and remote attendees.