home · mobile · calendar · defenses · 2009-2010 · 

Thesis Defense - Cer

Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study
Daniel Cer
Computer Science PhD Candidate

The translation of a sentence from one language to another by a statistical machine translation system is guided by knowledge sources that score competing candidate translations. These knowledge sources encode such factors as the fluency of the translation, the appropriateness of individually translated words and phrases, the word-order differences between the two languages as well as other factors such a preferences for long or shorter translations. Obtaining good translations depends critically on the proper weighting of the scores provided by these knowledge sources. Such weighting is typically performed using minimum error rate training (MERT). In this dissertation, I investigate the effectiveness of different optimization algorithms for MERT and the properties of system trained to different learning criteria. The results show that the most common optimization approach to MERT tends to perform worse than alternatives. The experiments also challenge long standing assumptions about the relationship between the training criteria used and the actual quality of the translations produced by the system. Specifically, it is shown that there are not sizable differences in the performance of systems trained to different popular surface level criteria, such as systems trained to maximize the BLEU, METEOR, and TERp scores. A novel method is presented for training to a more computationally intensive and semantic-orientated training criteria. To perform the experiments, I developed a new state-of-the-art machine translation system known as Phrasal.

Committee: James Martin, Professor (Chair)
Wayne Ward, Research Professor
Richard Byrd, Professor
Daniel Jurafsky, Stanford University
Christopher Manning, Stanford University
Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:20)