Ruprecht-Karls-Universität Heidelberg

Thesis Topics

My areas of research are in trustworthy NLP, discourse, semantics, pragmatics (particularly figurative language and sentiment analysis) and summarization. Please contact me if you are interested in a BA or MA thesis in those topics. Current ideas for thesis topics and areas include but are not limited to the following:

  1. Language Models for Timeline Summarization (BA/MA).

    Sebastian Martschat and me developed a submodular framework called TILSE for the generation of timeline overviews such as timelines on long-running wars or other events. One question is whether one can integrate language models into that framework, for example by letting the algorithm work on language model generated summaries of individual articles instead of on the whole article which should also make it more efficient.
  2. Bias in Language Models: The case of Sinti and Roma (BA/MA).

    It is well known that Language Models reproduce and even amplify bias against minorities. Most research has concentrated on English as well as gender bias but also racial bias against Blacks or religious bias against Jews or Muslims. To the best of my knowledge, there is no work on language model bias against Sinti and Roma. This thesis would establish the first benchmark test suite for bias against Sinti and Roma and evaluate language models against this benchmark.
  3. Measuring Linguistic Capabilities of Large Language Models(MA).

    It is unclear how good the linguistic and metalinguistic capabilities of language models are with conflicting results on grammaticality judgements, dependent on whether you measure via prompting or directly via probability measurements. This work would continue prior work on this topic and extend it towards discourse and pragmatic NLP problems.
  4. Cryptic Crosswords (BA/MA)

    Whereas standard crosswords are an almost solved problem in NLP (at least for English ) , results for cryptic crosswords that use clues needing character-level and phonological knowledge as well as making use of word play and puns are extremely low. This thesis would investigate the use of expert modules (such as anagram solvers) to improve the state-of-the art and/or the integration of fine-grained clue annotation to help LM solvers. If you are interested in algorithms for other word games, instead of cryptic crosswords, this might also be possible.

If you do not find anything you like, but still would like something in the area of trustworthy NLP, summarization, semantics/pragmatics or discourse please contact me.

zum Seitenanfang