Ruprecht-Karls-Universität Heidelberg

CLARIN-D Curation Project "Semantic Annotation for Digital Humanities" (2015 - 2016)

Semantic Annotation for Digital Humanities

The curation project focuses on semantic annotation, particularly on Word Sense Disambiguation (WSD) and Semantic Role Labeling (SRL). Based on previous research, the aims of CP3 are twofolds:

Area A: Consolidation and further development of WebAnno for practical use in DH projects

Further development of the web-based annotation tool WebAnno for enabling flexible SRL annotation.

Area B: Curation of resources for semantic annotation and further annotation of the NoSta-D corpus

Creation of a benchmark annotated corpus for German.

  • Annotation with VerbNet-style SRL are available on GNVN_semanno, including 3200 annotated predicate argument structures from the SALSA corpus as well as 450 predicate argument structures form the Dortmund Chat Corpus.
  • Additionally, parallel SRL annotation with PropBank-, FrameNet- and VerbNet-style frameworks are available on SR3de, including 3000 instances of the CoNLL 2009 shared task German data (also included in the SALSA corpus).

Area C: Supporting Shared Tasks for German for selected annotation types

The project is aiming to support for further development of tools and ressources for German-language corpora via supporting shared tasks with suitable objectives. The following shared tasks could be supported:

  • GermEval 2014 "Named Entity Recognition Shared Task", organized by Prof. Biemann and Prof. Padó.
  • GermEval 2015 "LexSub: Shared Task for German-language Lexical Substitution", organized by Prof. Gurevych and Prof. Biemann.
  • Additionally, a joint WSD/SRL shared task for German is planned by Prof. Frank and Prof. Gurevych.

