
Discourse processing: co-reference resolution and discourse-level event processing
Recent advancements in the field of Natural Language Processing have enabled complex tasks which aim to process textual input at the global (and more complex) level of documents. Discourse-level semantic analysis is crucial for advanced information access as well as automatic natural language understanding and text generation technology. Among the various discourse-level phenomena, we focus our research in multilingual co-reference resolution and discourse-level event processing. In the area of co-reference resolution we take previous efforts one step further by extending the machine-learning based toolkit (BART) for performing co-reference resolution in languages other than English. Our SFB 619 project on ritual structure and variation investigates the extraction and analysis of event sequences, whereas the SiGHTSee project deals with the presentation of events in instructional way-finding tasks.
Co-reference resolution for multiple languages
Co-reference resolution is the task of identifying noun phrases that are used to refer to the same extra-linguistic entity in a text, e.g. 'Prince', 'the Minneapolis genius' and 'AFKAP'. Building on previous experience developed as part of the Johns Hopkins Summer Workshop on Exploiting Lexical & Encyclopedic Resources for Entity Disambiguation we are currently expanding BART, a flexible toolkit for coreference resolution, to be able to perform co-reference resolution in languages other than English, such as German and Italian.
Event and semantic role labeling for discourse processing
Syntactic relations like "subject" or "object" do not uniformly describe the role a participant (expressed e.g. by a noun) plays in an event (expressed e.g. by a verb). If the sentence "Peter has bought a car." is passivized into "A car has been bought by Peter.", the role Peter plays in the buying-event is still the same as in the active sentence. The task of automatically attaching semantic roles (such as "Buyer", "Recipient", or "Goods") to phrases in text is called semantic role labeling. In the SALSA project at the University of the Saarland, we have participated in a large corpus annotation effort to create a German FrameNet lexical resource ( see Burchardt et al. 2006, 2009). The project investigated the application of Frame Semantics in various natural language processing tasks. For example, we were the first to investigate the use of frame-semantic processing for the task of Recognizing Textual Entailment (Burchardt and Frank 2006, Burchardt et al. 2007). At Heidelberg University we are now studying the impact of implicit (i.e. non-verbalized) discourse-connected semantic roles, as originally described in Burchardt et al. (2005), within the SemEval 2010 task on Linking Roles in Discourse.