NLmaps: A corpus for question-answering, consisting of 2,380 questions in English and German with corresponding Machine Readable Language (MRL) formulae, using the OpenStreetMap database.
ON5V: OntoNotes 5 Predicates Non-local Role Linking Data Set (Moor et al., 2013)
PatTR: A parallel patent corpus for statistical machine translation featuring three language pairs, German-English (23M sentences pairs), English-French (19M sentence pairs) and French-German (5M sentence pairs).
SR3de: Semantic Role Triple Dataset for German is a dataset with parallel PropBank-, VerbNet-, and FrameNet-style semantic role annotation on a portion of approx. 3000 instances of the CoNLL 2009 shared task German data (Hartmann et al. 2017)
WikiCaps: A large-scale multilingual data set of image-caption pairs for multimodal machine translation, extracted from Wikimedia Commons.
WikiCLIR: A large-scale German-English retrieval data set for Cross-Language Information Retrieval, extracted from Wikipedia.
cclir: A cross-language information retrieval (CLIR) toolbox based on the cdec decoder, code package used in Bag-of-words Forced Decoding for Cross-Lingual Information Retrieval (Hieber and Riezler, ACL 2015), inter alia.
convert: A python script that converts function-head style encodings in dependency treebanks in a content-head style encoding (as used in the UD treebanks) and vice versa (for adpositions, copula and coordination) (Rehbein, Steen, Do & Frank 2017) [LiMo project]
dtrain: A tuning method implemented for the cdec decoder, see Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT (Simianer, Riezler and Dyer, ACL 2012).