Wed, 11 May, 15:00 - 15:45 UTC
Automatic Speech Recognition (ASR) systems have been rapidly improving in accuracy, yet they will inevitably continue to make some transcription errors. For downstream applications that involve direct presentation of ASR output to a user, it can be helpful if the system is able to represent alternatives in an efficient and intuitive manner. To this end, we have developed phrase alternatives: these are similar to traditional N-best lists or word-level alternatives, but have the significant advantages of being more compact and expressive. This Show and Tell demonstration will present three distinct applications of phrase alternatives. The first is a comparative evaluation of the famous Switchboard benchmark, in which the official NIST SCTK software is used to score the oracle Word Error Rate (WER) for various representations of alternatives, achieving nearly 0% WER. The second application integrates phrase alternatives in a Lucene / Elasticsearch text-based search indexing framework, enabling scalable high-recall audio search across a large collection of recordings. The third application is a novel transcript editing user interface, in which phrase alternatives enable an expert practitioner to apply manual corrections to ASR output while listening to audio playback at faster than real-time speed. Each of these applications uses the publicly available Mod9 ASR Engine, which loads any Kaldi-compatible models, and the software demonstrations can be presented with interactivity for both on-site and remote attendees.