Technical Program

Paper Detail

Presentation #	1
Session:	ASR II
Location:	Kallirhoe Hall
Session Time:	Thursday, December 20, 13:30 - 15:30
Presentation Time:	Thursday, December 20, 13:30 - 15:30
Presentation:	Poster
Topic:	Speech recognition and synthesis:
Paper Title:	Occam's Adaptation: A Comparison of Interpolation of Bases Adaptation Methods for Multi-Dialect Acoustic Modeling with LSTMs
Authors:	Mikaela Grace, Meysam Bastani, Eugene Weinstein, Google, United States
Abstract:	Multidialectal languages can pose challenges for acoustic modeling. Past research has shown that with a large training corpus but without explicit modeling of inter-dialect variability, training individual per-dialect models yields superior performance to that of a single model trained on the combined data~\cite{li2018multi, diak}. Our goal was thus to create a single multidialect acoustic model that would rival the performance of the dialect-specific models. Working in the context of deep Long-Short Term Memory (LSTM) acoustic models trained on up to 40K hours of speech, we explored several methods for training and incorporating dialect-specific information into the model, including 12 variants of interpolation-of-bases techniques related to Cluster Adaptive Training (CAT) and Factorized Hidden Layer (FHL) techniques. We found that with our model topology and large training corpus, simply appending the dialect-specific information to the feature vector resulted in a more accurate model than any of the more complex interpolation-of-bases techniques, while requiring less model complexity and fewer parameters. This simple adaptation yielded a single unified model for all dialects that, in most cases, outperformed individual models which had been trained per-dialect.