Open-Source Software Projects hosted by our group:
Minimalist NMT for educational purposes.
Quality estimation for machine translation.
A cross-language information retrieval (CLIR) toolbox based on the cdec decoder, code package used in Bag-of-words Forced Decoding for Cross-Lingua...
A toolkit for grounded learning for statistical machine translation, as described in the ACL 2014 paper, Response-Based Learning for Grounded Machi...
A tuning method implemented for the cdec decoder, see Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Tra...
Preordering for Machine Translation.
A semantic parser that treats the task as a monolingual SMT problem. The underyling SMT framework is the cdec decoder.
Contributions by our Group to other Open-Source Software Projects:
A toolkit for neural machine translation.
An open-source tool for sequence learning in NLP.
A Japanese-English corpus of patent abstracts for patent prior art search, consisting of 100K queries and relevance judgements for 1.4M documents.
German translations for 1000 image captions from the COCO dataset.
Human pairwise and five-point ratings for 1000 translations from German to English.
A Full-Text Learning to Rank Dataset for Medical Information Retrieval, extracted from NutritionFacts.org.
A corpus for question-answering, consisting of 2,380 questions in English and German with corresponding Machine Readable Language (MRL) formulae, u...
A parallel patent corpus for statistical machine translation featuring three language pairs, German-English (23M sentences pairs), English-French (...
A large-scale multilingual data set of image-caption pairs for multimodal machine translation, extracted from Wikimedia Commons.
A large-scale German-English retrieval data set for Cross-Language Information Retrieval, extracted from Wikipedia.