Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Linguistic Knowledge for Statistical Machine Translation

Kursbeschreibung

Studiengang Modulkürzel Leistungs-
bewertung
BA-2010 AS-CL 8 LP
NBA AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Magister - -
Dozenten/-innen Alexander Fraser
Veranstaltungsart Hauptseminar
Erster Termin 22.04.2014
Zeit und Ort Di, 11:1512:45,
INF 325 / SR 23 (SR)
Commitment-Frist 16.06.13.07.2014

Teilnahmevoraussetzungen

"Statistical Machine Translation"

Leistungsnachweis

  • regelmäßige und aktive Teilnahme
  • Referat
  • Hausarbeit oder Projekt

Inhalt

Phrase-based statistical machine translation (PBSMT) is the state-of-the-art for machine translation of some language pairs. PBSMT is surprisingly free of explicit linguistic knowledge, but can be very effective. However, this is not always true. For instance, when translating into a morphologically rich language the translation quality is lacking, particularly when there is also significant syntactic divergence between the two languages. The quality of PBSMT is poor in this case because of independence assumptions made involving morphology and syntax in the translation model that do not reflect linguistic reality.

In this course we will read papers that try to address this problem by adding linguistic knowledge to the translation process in a wide variety of ways. We will start with an intensive focus on morphology. We will then move on to syntax, semantic roles and beyond. Participants will be encouraged to look at actual translation system output for problems and we will connect these observations with the work that we discuss.


To take part in this course, please fill out this questionnaire.

Kursübersicht

Seminarplan

Date Material Referent
2014-04-22 Introduction to Course and Research Area Fraser
2014-04-29 Empirical Methods for Compound Splitting, Philipp Koehn and Kevin Knight, EACL 2003
Improving Statistical MT Through Morphological Analysis. Sharon Goldwater and David McClosky. EMNLP 2005
Kiem
2014-05-06 Enriching morphologically poor languages for statistical machine translation. Eleftherios Avramidis, Philipp Koehn. ACL 2008
Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish. R Yeniterzi, K Oflazer. ACL 2010
reading group
2014-05-13 Arabic preprocessing schemes for statistical machine translation. Nizar Habash, Fatiha Sadat. NAACL 2006
Unsupervised morphology rivals supervised morphology for Arabic MT. D Stallard, J Devlin, M Kayser, YK Lee. ACL 2012
reading group
2014-05-20 Dependency Treelet Translation: Syntactically Informed Phrasal SMT. Chris Quirk, Arul Menezes, Colin Cherry. ACL 2005
A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT. Andreas Zollmann, Ashish Venugopal, Franz Och and Jay Ponte. COLING 2008
Bylinovich
2014-05-27 Applying morphology generation models to machine translation. Kristina Toutanova, Hisami Suzuki, and Achim Ruopp. ACL 2008
Combining morpheme-based machine translation with post-processing morpheme prediction. Ann Clifton, Anoop Sarkar. ACL 2011
Mayer reading group
2014-06-03 Optimizing Chinese Word Segmentation for Machine Translation Performance. Pi-Chuan Chang, Michel Galley, and Christopher D. Manning. ACL WMT 2008
Unsupervised Tokenization for Machine Translation. Tagyoung Chung and Daniel Gildea. EMNLP 2009
Placzek
2014-06-10 Chinese Syntactic Reordering for Statistical Machine Translation. C Wang, M Collins, P Koehn. EMNLP-CoNLL 2007
Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation. Dmitriy Genzel. COLING-2010
Li
2014-06-17 What's in a translation rule? Michel Galley, Mark Hopkins, Kevin Knight, Daniel Marcu. NAACL 2004
A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. Libin Shen, Jinxi Xu, Ralph Weischedel. ACL 2008
Schneider
2014-07-01 A Hierarchical Phrase-Based Model for Statistical Machine Translation. David Chiang. ACL 2005
Tree-to-String Alignment Template for Statistical Machine Translation. Yang Liu, Qun Liu, Shouxun Lin. ACL 2006
Claus
2014-07-08 Unsupervised Multilingual Learning for Morphological Segmentation. Benjamin Snyder, Regina Barzilay. ACL 2008
Unsupervised bilingual morpheme segmentation and alignment with context-rich hidden semi-Markov models. J Naradowsky, K Toutanova. ACL 2011
Nakryyko reading group
2014-07-15 Semantic roles for SMT: a hybrid two-pass model. Dekai Wu, Pascale Fung. NAACL 2009
Semantic role features for machine translation. Ding Liu, Daniel Gildea. COLING 2010
Haider
2014-07-22 Bilingual Sentiment Consistency for Statistical Machine Translation. Chen and Zhu. EACL 2014
Applying the semantics of negation to SMT through n-best list re-ranking. Fancellu and Webber. EACL 2014
Haas

Literatur

  • Philipp Koehn's textbook Statistical Machine Translation.

» weitere Kursmaterialien

zum Seitenanfang