HumanMT: Human Machine Translation Ratings

HumanMT is a collection of human ratings and corrections of machine translations. It consists of two parts: The first part contains five-point and pairwise sentence-level ratings, the second part contains error markings and corrections. Details are described in the following.

I. Sentence-level ratings

This is a collection of five-point and pairwise ratings for 1000 German-English machine translations of TED talks (IWSLT 2014). The ratings were collected with the purpose of assessing machine translation quality rating reliability and learnability to improve a neural machine translation model with human reinforcement (see publications).

II. Error markings and corrections

This is a collection of word-level error markings and post-edits/corrections for 3120 English-German machine translated sentences of 30 selected TED talks (IWSLT 2017). Each sentence received either a correction or a marking of errors from human annotators. This data was collected with the purpose of comparing annotation cost and quality, and potential for downstream machine translation improvements between annotation modes (see publications).

Terms of Use

HumanMT is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please cite (Kreutzer et al., 2018) for part I, or (Kreutzer et al., 2020) for part II, if you use the respective corpora in your work.

Data

For a detailed description of the translation and rating process and the interfaces, please see the publication.

I. Sentence-level ratings

This collection contains 1000 rated translations, that were rated by 14 individuals by pairwise preference judgments and by 16 individuals on a five-point Likert scale. Raters were university students with fluent or native English and German skills. 200 of the translations were repeated in the rating process to measure intra-annotator reliability.

II. Error markings and corrections

This collection contains 3120 rated translations, that were either corrected or their errors were marked by human annotators. Annotators were university students with fluent or native English and German skills.

Download

I. Sentence-level ratings

Files:

Format:
The .tsv files contain one rated item per line. The headers describe the contents of the columns: First a rating item id (ID), then the source sentence (ORIGINAL), followed by the rated translation (TRANSLATION) or translation pair (TRANSLATION 1, TRANSLATION 2), and the ratings by annotator (RESPONDENT X). The rating id is composed as follows:

  • Five-point: The id consists of an “R” for repeated items and “A” for unique items, a numerical id, “OUT” or “IN” depending on whether the translations originate from an in-domain (IN) or out-of-domain model (OUT).
  • Pairwise: The id consists of an “R” for repeated pairs and “A” for unique pairs, and a numerical id.

II. Error markings and corrections

Files:

Format: The tar.gz archive contains three files, one for repeated annotations (“-agreement”), and one for one-time annotations (“-annotation”), and a README.txt that describes the format.

Acknowledgments

The work was in part supported by the “Interactive Lecture Translation” project funded by the Deutsche Forschungsgemeinschaft (DFG) (Research Grant RI 2221/4-1).

Publications

  1. Julia Kreutzer, Joshua Uyheng and Stefan Riezler
    Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
    Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia, 2018
    @inproceedings{kreutzer2018b,
      author = {Kreutzer, Julia and Uyheng, Joshua and Riezler, Stefan},
      title = {Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning},
      journal = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
      journal-abbrev = {ACL},
      year = {2018},
      city = {Melbourne},
      country = {Australia},
      url = {http://www.cl.uni-heidelberg.de/~riezler/publications/papers/ACL2018.pdf}
    }
    
  2. Julia Kreutzer, Nathaniel Berger and Stefan Riezler
    Correct Me If You Can: Learning from Error Corrections and Markings
    Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT), Lisbon, Portugal, 2020
    @article{kreutzer2020a,
      author = {Kreutzer, Julia and Berger, Nathaniel and Riezler, Stefan},
      year = {2020},
      title = {Correct Me If You Can: Learning from Error Corrections and Markings},
      journal = {Proceedings of the 22nd Annual Conference of the European Association for Machine Translation},
      journal-abbrev = {EAMT},
      city = {Lisbon, Portugal},
      url = {https://arxiv.org/abs/2004.11222}
    }