Ruprecht-Karls-Universität Heidelberg

HumanMT: Human Machine Translation Ratings

HumanMT is a collection of five-point and pairwise ratings for 1000 German-English machine translations of TED talks (IWSLT 2014). This collection of ratings was created with the purpose of assessing machine translation quality rating reliability and learnability to improve a neural machine translation model with human reinforcement (see publication).

Terms of use

HumanMT is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please cite Kreutzer, Uyheng, Riezler (2018), if you use the corpus in your work.

Creative Commons License

Data

This collection contains 1000 rated translations, that were rated by 14 individuals by pairwise preference judgments and by 16 individuals on a five-point Likert scale. Raters were university students with fluent or native English and German skills. 200 of the translations were repeated in the rating process to measure intra-annotator reliability.

For a detailed description of the translation and rating process, please see the publication.

Download

Ratings:

Format: The .tsv files contain one rated item per line. The headers describe the contents of the columns: First a rating item id (ID), then the source sentence (ORIGINAL), followed by the rated translation (TRANSLATION) or translation pair (TRANSLATION 1, TRANSLATION 2), and the ratings by annotator (RESPONDENT X). The rating id is composed as follows:

  • Five-point: The id consists of an "R" for repeated items and "A" for unique items, a numerical id, "OUT" or "IN" depending on whether the translations originate from an in-domain (IN) or out-of-domain model (OUT).
  • Pairwise: The id consists of an "R" for repeated pairs and "A" for unique pairs, and a numerical id.

Acknowledgments

The work was in part supported by the "Interactive Lecture Translation" project funded by the Deutsche Forschungsgemeinschaft (DFG) (Research Grant RI 2221/4-1).

Publication

Kreutzer, J., Uyheng, J., Riezler, S. (2018). Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL) 2018. Melbourne, Australia.

zum Seitenanfang