Technical Program

Paper Detail

Presentation #7
Session:ASR IV
Location:Kallirhoe Hall
Session Time:Friday, December 21, 13:30 - 15:30
Presentation Time:Friday, December 21, 13:30 - 15:30
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: HIERARCHICAL MULTITASK LEARNING WITH CTC
Authors: Ramon Sanabria, Florian Metze, Carnegie Mellon University, United States
Abstract: In Automatic Speech Recognition, it is still challenging to learn useful intermediate representations when using high-level (or abstract) target units such as words. For that reason, when only a few hundreds of hours of training data are available, character or phoneme-based systems tend to outperform word-based systems. In this paper, we show how Hierarchical Multitask Learning can encourage the formation of useful intermediate representations. We achieve this by performing Connectionist Temporal Classification at different levels of the network with targets of different granularity. Our model thus performs predictions in multiple scales for the same input. On the standard 300h Switchboard training setup, our hierarchical multitask architecture demonstrates improvements over single-task architectures with the same number of parameters. Our model obtains 14.0% Word Error Rate on the Switchboard subset of the Eval2000 test set without any decoder or language model, outperforming the current state-of-the-art on non-autoregressive acoustic-to-word models.