Software

Open-Source Software Projects hosted by our group:

Joey NMT

Minimalist NMT for educational purposes.

QUETCH

Quality estimation for machine translation.

cclir

A cross-language information retrieval (CLIR) toolbox based on the cdec decoder, code package used in Bag-of-words Forced Decoding for Cross-Lingua...

rebol

A toolkit for grounded learning for statistical machine translation, as described in the ACL 2014 paper, Response-Based Learning for Grounded Machi...

dtrain

A tuning method implemented for the cdec decoder, see Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Tra...

otedama

Preordering for Machine Translation.

semparse

A semantic parser that treats the task as a monolingual SMT problem. The underyling SMT framework is the cdec decoder.

Contributions by our Group to other Open-Source Software Projects:

nematus

A toolkit for neural machine translation.

Neural Monkey

An open-source tool for sequence learning in NLP.

Corpora

BoostCLIR

A Japanese-English corpus of patent abstracts for patent prior art search, consisting of 100K queries and relevance judgements for 1.4M documents.

DeCOCO

German translations for 1000 image captions from the COCO dataset.

HumanMT

Human pairwise and five-point ratings for 1000 translations from German to English.

LibriVoxDeEn

A corpus for German-to-English Speech Translation and Speech Recognition.

NFCorpus

A Full-Text Learning to Rank Dataset for Medical Information Retrieval, extracted from NutritionFacts.org.

NLmaps

A corpus for question-answering, consisting of 2,380 questions in English and German with corresponding Machine Readable Language (MRL) formulae, u...

PatTR

A parallel patent corpus for statistical machine translation featuring three language pairs, German-English (23M sentences pairs), English-French (...

WikiCaps

A large-scale multilingual data set of image-caption pairs for multimodal machine translation, extracted from Wikimedia Commons.

WikiCLIR

A large-scale German-English retrieval data set for Cross-Language Information Retrieval, extracted from Wikipedia.