Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Developments in Natural Language Generation


Studiengang Modulkürzel Leistungs-
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010[25%] BS-AC, BS-FL 4 LP
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Dozenten/-innen Anette Frank
Veranstaltungsart Proseminar / Hauptseminar
Sprache English
Erster Termin 17.04.2023
Zeit und Ort Montags, 13:15-14:45, INF 328 / SR 25
Commitment-Frist tbd.


  • Programming I and II
  • Mathematical Methods
  • Statistical Methods
  • Semantics


  • BA Computational Linguistics (advanced students)
  • MA Computational Linguistics
  • MA Data and Computer ScienceMA Scientific Computing


  • Active participation including assignments;
  • Presentation;
  • Project, second presentation or other agreed forms of homework


Natural Language Generation (NLG) is a key functionality for many NLP applications which may be grounded in purely linguistic or in multimodal, e.g., visual-linguistic contexts. Typically, language generation is conditioned on textual inputs, as in interactive question-answering or dialogue settings, or in text rewriting tasks (e.g., summarization, text simplification or style transfer). In restricted domains, NLG can also be data-driven. Here, the generated language is conditioned on structured inputs, such as database query results, domain knowledge graphs, or internal states of artificial agents that interact with their environment and with humans.

NLG methods have been revolutionized through the use of autoregressive large pre- trained language models (LLMs). Yet, despite their stunning versatility in generating high-quality texts, we are still left with important research questions to be solved:

  • How can we guarantee the validity, faithfulness and consistency of LLM outputs, to make such systems trust-worthy and safe?
  • How can we make these models interpretable, so that we can ground their predictions in a transparent way? Can we enhance their trustworthiness by making them interact with interpretable, symbolic knowledge sources, and can we use these to guide generation? Or can we use neural representations derived such knowledge resources for this purpose?
  • Which capabilities do LLM-based NLG systems really acquire, are they capable of solving higher-level reasoning problems, and how can we work towards this?
  • Can we deploy NLG methods for controlling language generation by means of self-rationalization, question generation & answering, or contextualized conclusion generation?

In the seminar we will review different varieties of language model types, exemplified by specific NLP tasks, and how to measure their performance. We will focus on methods that aim to control NLG systems in specific ways, so that their output is valid, faithful and consistent, to enhance their trustworthiness and usefulness. This includes the interaction with symbolic knowledge resources (e.g., by search, joint encoding) and inference methods. We will also investigate ways of using NLG methods as a means to control the validity of system predictions, to deal with complex tasks that go beyond learning statistical regularities of language.

Topics will include:

  • neural architecture types for NLG (encoder-decoder; decoder-only; instruction-based models; text editing models)
  • decoding methods (autoregressive vs. non-autoregressive; contrastive search, iterative decoding; reranking)
  • prompting for instruction tuning and decoding; in-context learning; chain-of- thought prompting
  • content control (plugn play; text generation re-ranking; knowledge-guided pre-training, knowledge retrieval and injection)
  • applications and tasks: dialogue; question and answer generation; dataset generation

Early parts of the seminar will be introductory lectures, to provide overview and to create common ground. We will then read and discuss selected papers with presentations and discussion rounds.



» weitere Kursmaterialien

zum Seitenanfang