Ruprecht-Karls-Universität Heidelberg

Thesis Topics

My areas of research are in discourse, semantics, pragmatics (particularly figurative language and sentiment analysis) and summarization. Please contact me if you are interested in a BA or MA thesis in those topics. Current ideas for thesis topics and areas include but are not limited to the following:

  1. Integrating Advanced Date Selection into Submodular Algorithms for Timeline Summarization (BA/MA).

    Sebastian Martschat and me developed a submodular framework called TILSE for the generation of timeline overviews such as timelines on long-running wars or other events. The date selection algorithm in this framework is rather basic, i.e. it only looks at the frequency a date is mentioned to determine its importance. The proposed MA work will integrate more sophisticated date selection methods (such as by which other important dates a date is mentioned, causal relations between dates) into TILSE.
  2. Integrating Importance Functions into Submodular Algorithms for Timeline Summarization (BA/MA).

    Sebastian Martschat and me developed a submodular framework called TILSE for the generation of timeline overviews such as timelines on long-running wars or other events. Currently it measures the importance of sentences in a corpus according to only two main functions that measure how central a sentence is to the underlying corpus. The BA thesis would improve this algorithm by following the work of Maxime Peyrard, ACL 2019 for general summarisation with integrating general importance functions into the toolkit.
  3. Modelling the Influence of Information Retrieval on Timeline Summarization (MA).

    Timeline Summarization generates dated timelines for long-running social events such as wars, disease outbreaks or financial crises. In contrast to standard single or multidocument timeline summarization, it summarizes hundreds of documents and is therefore dependent on good IR processes before summarization. In previous work , students in Heidelberg under my supervision have shown that performance differences for summarization systems do not necessarily hold up across differing prior IR models. The next question is how to optimize the IR for subsequent timeline summarization systems (without having to build a whole new IR system). Following a model for Question Answering this thesis would adopt a simple strategy to select optimal depth and corpus size based on retrieval confidence to optimize IR for TL summarization.
  4. Domain- and task-adaptive pretraining for figurative language resolution (BA/MA)

    Most current models for metaphor recognition use context-sensitive word embeddings such as BERT/ROBERTA. Using out of the box embeddings, however, often underperform on texts from genres less well-represented in the corpus the embeddings were trained on, such as dialogue. Following Gururangan, ACL 2020 on other tasks, this thesis wants to explore how domain- and task-adaptive pretraining might improve metaphor (and metonymy) resolution performance.
  5. Metaphor recognition for unconventional/novel metaphors (BA/MA)

    Current embedding-based models for metaphor recognition perform deceptively well. However, in a recent COLING 2020 paper, students of mine have shown that these models tend to perform very well on conventionalized metaphors that are more akin to different word senses (such as using "dark" for non-physical senses as in "dark humour") but fail on novel or poetic word uses. There are several different thesis topics to explore here, varying in difficulty and in technical or linguistic focus. Possibilities include: (i) Developing datasets with more occurrences of novel metaphors via crowdsourcing (ii) Integrating knowledge into metaphor recognition approaches to improve the performance on novel metaphors and more.

If you do not find anything you like, but still would like something in the area of summarization, semantics/pragmatics or discourse please contact me.

zum Seitenanfang