Simultaneous speech translation attempts to produce high quality translations while at the same time minimizing the latency between production of words in the source language and translation into the target language.
A key prediction problem in simultaneous translation is when to start translating the input stream. I will talk about two new algorithms that together provide a solution to this problem. The first algorithm learns to find effective places to break the input stream. In order to balance the often conflicting demands of low latency and high translation quality, the algorithm exploits the notion of Pareto optimality. The second algorithm is a stream decoder that incrementally processes the input stream from left to right and produces output translations for segments of the input. These segments are found by consulting classifiers trained on data created by the first algorithm.
We compare our approach with previous work and present translation quality scores (BLEU scores) and the latency of generating translations (number of segments translated per second) on audio lecture data from the TED talks collection.
Anoop Sarkar is Professor of Computer Science at Simon Fraser University in British Columbia, Canada where he co-directs the Natural Language Laboratory (http://natlang.cs.sfu.ca).
His research uses machine learning methods applied to natural language processing, specifically statistical machine translation between all human languages, and the summarization of information in language using a combination of visualization and semantic parsing algorithms. He sometimes dreams about a computational decipherment of ancient scripts and mysterious manuscripts.
He received his Ph.D. from the Department of Computer and Information Sciences at the University of Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised statistical parsing using tree-adjoining grammars.