Technical Program

Paper Detail

Presentation #10
Session:ASR IV
Location:Kallirhoe Hall
Session Time:Friday, December 21, 13:30 - 15:30
Presentation Time:Friday, December 21, 13:30 - 15:30
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: EFFICIENT IMPLEMENTATION OF RECURRENT NEURAL NETWORK TRANSDUCER IN TENSORFLOW
Authors: Tom Bagby, Kanishka Rao, Khe Chai Sim, Google, United States
Abstract: Recurrent neural network transducer (RNN-T) has been successfully applied to automatic speech recognition to jointly learn the acoustic and language model components. The RNN-T loss and its gradient with respect to the softmax outputs can be computed efficiently using a forward-backward algorithm. In this paper, we present an efficient implementation of the RNN-T forward-backward and Viterbi algorithms using standard matrix operations. This allows us to easily implement the algorithm in TensorFlow by making use of the existing hardware-accelerated implementations of these operations. This work is based on a similar technique used in our previous work for computing the connectionist temporal classification and lattice-free maximum mutual information losses, where the forward and backward recursions are viewed as a bi-directional RNN whose states represent the forward and backward probabilities. Our benchmark results on graphic processing unit (GPU) and tensor processing unit (TPU) show that our implementation can achieve better throughput performance by increasing the batch size to maximize parallel computation. Furthermore, our implementation is about twice as fast on TPU compared to GPU for batch