Ruprecht-Karls-Universität Heidelberg
Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg
Siegel der Uni Heidelberg

Investigating Multilingual Coreference Resolution by Universal Annotations

Abstract

Multilingual coreference resolution (MCR) has been a long-standing and challenging task. With the newly proposed multilingual coreference dataset, CorefUD (Nedoluzhko et al., 2022), we conduct a comprehensive investigation into the task by leveraging the harmonized universal morphosyntactic and coreference annotations provided. First, we study coreference by examining the ground truth data at different levels of granularity, namely mention, entity and document levels, to gain insights into the characteristics of coreference across multiple languages. Second, we perform a root cause analysis on the most challenging cases that the SotA system fails to resolve in the CRAC 2022 shared task using the universal annotations. Last, we showcase the use of integrating several features extracted from universal morphosyntactic annotations into a baseline system, to assess their potential benefits for the MCR task. Our results show that our best configuration of features improves the baseline by 0.9 F1 score.

zum Seitenanfang