X-SRL: Parallel Cross-lingual Semantic Role Labeling
The Heidelberg University NLP Group announces a new dataset for multilingual SRL parsing: “X-SRL: Parallel Cross-lingual Semantic Role Labeling”
- it is based on the English CoNLL-09 dataset,
- it is parallel between the four languages: English, French, German and Spanish via high-quality Machine Translation using DeepL;
- it uniformly applies the PropBank SRL labeling scheme as developed for English for all covered languages, using a novel, dense and precise label projection mechanism, and
- it has been automatically and manually controlled for training and evaluation sections, respectively.
The corpus is available through LDC under this Link: https://catalog.ldc.upenn.edu/LDC2021T09
A description for the motivation, development and analysis of this dataset, as well as experiments on multilingual and crosslingual SRL labeling you find in the publication below:
Daza, A. and Frank, A. (2020): X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset. The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp. 3904--3914.
Enjoy!