DeCOCO: COCO English-German Parallel Captions

DeCOCO is a bilingual (English-German) corpus of image descriptions, where the English part is extracted from the COCO dataset, and the German part are translations by a native German speaker.

Terms of Use

DeCOCO is licensed under a Creative Commons Attribution 4.0 License. License: CC BY-NC-SA 4.0

If you use the corpus in your work, please cite (Hitschler, Schamoni, & Riezler, 2016)

Data

For a detailed description of the corpus and its application, please see the above publication.

For detailed licensing information, please see the enclosed terms_of_use.txt.

Download

Parallel data: ms_coco_parallel.tar.bz2 (31kB, md5: 774133c32c1477fa04308c703ac99330)

Publication

  1. Julian Hitschler, Shigehiko Schamoni and Stefan Riezler
    Multimodal Pivots for Image Caption Translation
    Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016
    @inproceedings{hitschler2016a,
      author = {Hitschler, Julian and Schamoni, Shigehiko and Riezler, Stefan},
      title = {Multimodal Pivots for Image Caption Translation},
      journal = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics},
      journal-abbrev = {ACL},
      year = {2016},
      city = {Berlin},
      country = {Germany},
      url = {http://www.cl.uni-heidelberg.de/~riezler/publications/papers/ACL2016.2.pdf}
    }