Title: From measuring similarity between meaning representations to measuring similarities between generated texts
Speaker: Juri Opitz (ICL)
First, we visit previous work on measuring similarity between meaning representations. Particularly, we assess previously proposed metrics and outline their weaknesses as well as prospects for improvements. For instance, all previously proposed metrics do not take into account the graded similarity between nodes or sub-structures. As a first step to address the issues, we propose S^2match, that takes into account the graded similarity of node concepts (e.g., enemy-foe).
Second, we visit the problem of evaluating text generations from AMR (AMR-2-text). Here, we see that the weaknesses of conventional metrics such as BLEU get compounded. To alleviate the issue, we propose to evaluate the generated text in the abstract AMR domain, by using a high performance AMR parser for projection and our previously developed S^2match metric for graph similarity assessment. We show that this evaluation is safer and offers prospects beyond AMR-2-text since it offers the possibility to deliver an explainable evaluation of text generation systems, e.g., to assess their SRL or coreference competence.