UWEE Tech Report Series

Class-dependent Interpolation for Estimating Language Models from Multiple Text Sources


UWEETR-2003-0003

Author(s):
Ivan Bulyko, Mari Ostendorf, Andreas Stolcke

Keywords:
Language modeling, speech recognition, web data, class-based mixtures

Abstract

Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.

Download the PDF version

Download the Gzipped Postscript version