Ruprecht-Karls-Universität Heidelberg

SR3de - Semantic Role Triple Dataset for German

Triple Dataset

Annotation

Parallel portion of the CoNLL 2009 German data set for the shared task 'Syntactic and Semantic Dependencies in Multiple Languages' , with parallel annotation for the three major semantic role labeling frameworks:
  • PropBank-style (PB)
  • VerbNet-style (VN)
  • FrameNet-style (FN)

Format

To produce the parallel SR3de corpus, you need:
  • from LDC the CoNLL 2009 ST corpus and its original PropBank annotation
  • from the homepage of the SALSA project the corresponding SALSA 2.0 FrameNet-style annotations
  • We directly provide the VN-style annotations produced by the GNVN project.
We provide a skript that takes your copies of the above resources as input, and computes parallel files for the corresponding annotations in the SR3de corpus in CoNLL format, as an enrichment to the original LDC CoNLL representation.

Statistics

Dataset part predicate argument
structures
predicate types
(lemma)
role types
(PB / VN / FN)
train 2,196 198 10 / 30 / 278
dev 250 121 6 / 23 / 145
test 520 152 8 / 25 / 165

Information on the original corpora

Framework Description of Annotation Original Corpus
PropBank CoNLL 2009 ST LDC
VerbNet GNVN project GNVN data
FrameNet SALSA project SALSA 2.0 corpus

top of page