UWEE Tech Report Series

Improving Out-of-Vocabulary Name Resolution


David Palmer, Mari Ostendorf

speech recognition, out-of-vocabulary words


This paper presents algorithms for generating targeted name lists for candidate out-of-vocabulary (OOV) words for applications in language processing, particularly speech recognition. Focusing on names, which are shown to be the dominant class of OOVs in news broadcasts, the approach involves offline generation of a large name list and online pruning based on a phonetic distance. The resulting list can be used in a rescoring pass in automatic speech recognition. We also show that a simple variation of the approach can be used to generate alternate name spellings which may be useful for query expansion in information retrieval. By using a wide variety of sources, including automatic name phrase tagging of temporally relevant news text, OOV coverage can be improved by nearly a factor of two with only a 10% increase in the word list size. For one source, coverage increased from 13% to 94%. Phonetic pruning can be used to reduce the list size by an order of magnitude with only a small loss in coverage.

Download the PDF version