Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

X-SRL: Parallel Cross-lingual Semantic Role Labeling

The Heidelberg University NLP Group announces a new dataset for multilingual SRL parsing: “X-SRL: Parallel Cross-lingual Semantic Role Labeling”

it is based on the English CoNLL-09 dataset,
it is parallel between the four languages: English, French, German and Spanish via high-quality Machine Translation using DeepL;
it uniformly applies the PropBank SRL labeling scheme as developed for English for all covered languages, using a novel, dense and precise label projection mechanism, and
it has been automatically and manually controlled for training and evaluation sections, respectively.

The corpus is available through LDC under this Link: https://catalog.ldc.upenn.edu/LDC2021T09

A description for the motivation, development and analysis of this dataset, as well as experiments on multilingual and crosslingual SRL labeling you find in the publication below:

Daza, A. and Frank, A. (2020): X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset. The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp. 3904--3914.

Enjoy!