skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · thesis defenses · 2009-2010 · 
 

Thesis Defense - Cer

 
7/27/2010
1:00pm-3:00pm
CINC 102

Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study
Daniel M. Cer
Computer Science PhD Candidate

The translation of a sentence from one language to another by a statistical machine translation system is guided by knowledge sources that score competing candidate translations. These knowledge sources encode such factors as the fluency of the translation, the appropriateness of individually translated words and phrases, the word-order differences between the two languages as well as other factors such a preferences for long or shorter translations. Obtaining good translations depends critically on the proper weighting of the scores provided by these knowledge sources. Such weighting is typically performed using minimum error rate training (MERT). In this dissertation, I investigate the effectiveness of different optimization algorithms for MERT and the properties of system trained to different learning criteria. The results show that the most common optimization approach to MERT tends to perform worse than alternatives. The experiments also challenge long standing assumptions about the relationship between the training criteria used and the actual quality of the translations produced by the system. Specifically, it is shown that there are not sizable differences in the performance of systems trained to different popular surface level criteria, such as systems trained to maximize the BLEU, METEOR, and TERp scores. A novel method is presented for training to a more computationally intensive and semantic-orientated training criteria. To perform the experiments, I developed a new state-of-the-art machine translation system known as Phrasal.

Committee: James Martin, Professor (Chair)
Wayne Ward, Research Professor
Richard Byrd, Professor
Daniel Jurafsky, Stanford University
Christopher Manning, Stanford University

 
See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Questions/Comments?
Send email to

Engineering Center Office Tower
ECOT 717
+1-303-492-7514
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:40)
 
.