Technical Program

Paper Detail

Presentation #	8
Session:	ASR II
Location:	Kallirhoe Hall
Session Time:	Thursday, December 20, 13:30 - 15:30
Presentation Time:	Thursday, December 20, 13:30 - 15:30
Presentation:	Poster
Topic:	Speech recognition and synthesis:
Paper Title:	FIRST-PASS TECHNIQUES FOR VERY LARGE VOCABULARY SPEECH RECOGNITION OF MORPHOLOGICALLY RICH LANGUAGES
Authors:	Matti Varjokallio, Sami Virpioja, Mikko Kurimo, Aalto University, Finland
Abstract:	In speech recognition of morphologically rich languages, very large vocabulary sizes are required to achieve good error rates. Especially traditional n-gram language models trained over word sequences suffer from data sparsity issues. The language modelling can often be improved by segmenting the words to sequences of subword units that are more frequent. Another solution is to cluster the words into classes and apply a class-based language model. We show that linearly interpolating n-gram models trained over words, subwords, and word classes improves the first-pass speech recognition accuracy in very large vocabulary speech recognition tasks for two morphologically rich and agglutinative languages, Finnish and Estonian. To overcome performance issues, we also introduce a novel language model look-ahead method utilizing a class bigram model. The method improves the results over a unigram look-ahead model with the same recognition speed, the difference increasing for small real-time factors. The improved model combination and look-ahead model are useful in cases where real-time recognition is required or when the improved hypotheses help with further recognition passes. For instance, neural network language models are mostly applied by rescoring the generated hypotheses due to higher computational costs.