UWEE Tech Report Series

Domain Adaptation Through Phrase Generalization for Improved Statistical Machine Translation Quality


UWEETR-2008-0003

Author(s):
Chris Lim, Katrin Kirchhoff

Keywords:
statistical machine translation, string kernels, domain adaptation

Abstract

This paper presents a method for domain adaptation (incorporating out-of-domain data) through phrase generalization (learning/using phrase templates) in order to improve the Italian-English translation quality on the BTEC travel task. The process of phrase generalization is described, and its inclusion in the system resulted in noticeable, but only minor improvements because of alignment problems and noisy lexicon issues. Several enhancements to the process are proposed, which are expected to result in more significant gains.

Download the PDF version

Download the Gzipped Postscript version