Resources / corpora / l / de
Resources
-
CELEX2
-
CoNLL NER
-
Europarl
-
Heise-Newsticker Meldungen
-
NEGRA
-
Projekt Gutenberg
-
Reuters Corpus
-
SALSA
-
SMULTRONSMULTRON (Stockholm MULtilingual TReebank) is a parallel treebank developed by the Computational Linguistics Group at the Department of Linguistics, at Stockholm University. The parallel treebank contains around 1000 sentences in English, German and Swedish. The sentences have been PoS-tagged and annotated with phrase structure trees. The trees have been aligned on sentence, phrase and word level. Additionally, the German and Swedish monolingual treebanks contain lemma information.
-
SemEval 2010 Task 1: Coreference Resolution in Multiple Languages
-
TIGERThe TIGER Treebank is a corpus of 40.000 syntactically annotated German newspaper sentences. The annotation scheme used is an extended and improved version of the NEGRA annotation scheme. The conll06-train+test directory contains the dependency-converted corpus used in the CoNLL 2006 Shared Task. We have also added a dependency version which was converted with the pennconverter (default setting; directory dependency-converted), but you will probably want to use the CoNLL06 data.
-
The Tübingen Treebank of Written German
-
VICO Social Media Forum-Korpus
-
Leipzig Corpora Collection / WortschatzThe Leipzig Corpora Collection presents corpora in different languages using the same format and comparable sources. The sources are either newspaper texts or texts randomly collected from the web. The texts are split into sentences. Non-sentences and foreign language material was removed.
