Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Natural Language Generation

Module Description

Course Module Abbreviation Credit Points
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010[25%] BS-AC 4 LP
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Lecturer Anette Frank
Module Type Proseminar / Hauptseminar
Language English
First Session 20.04.2021
Time and Place Tuesday, 16:15-17:45, Online
Commitment Period tbd.

Prerequisite for Participation

  • Statistical Methods
  • Foundational Knowledge in Neural Networks (e.g. Neural Networks Class)

Assessment

  • Active Participation
  • Presentation
  • Term paper or implementation project

Inhalt

Natural Language Generation (NLG) is a key functionality in many NLP applications. Depending on the type of input for generation, we distinguish data-driven from text-driven language generation. Data-driven NLG aims to verbalize content as captured in knowledge bases or structured linguistic representations, e.g. to communicate search results in database-driven Question Answering, or producing text for structured database records (advertisings, weather or financial reports, etc.). Text-driven NLG is found in text-to-text transduction tasks such as text simplification, summarization or end-to-end dialogue systems.

Further types of input for NLG involve vision, e.g. when generating descriptions of images or answering questions about them, or when modeling situated language as e.g. in robotics, where intelligent systems need to interact with their environment and with humans.

With the advent of Neural Network methods, methods in NLG have been revolutionized through the use of powerful autoregressive networks and pretrained language models (PTLMs). An important research question is, however, how to reconcile the power of autoregressive PTLMs with control of faithfulness to the input for text generation when using them. Novel directions also include non-autoregressive models that take advantage of the inherent parallelism of transformer-based architectures. Finally, in many applications it is important to make such systems interpretable and, ideally, self-explanatory, where again, the NLG capabilities of a system come into play.

In the seminar we will review the fundamentals of NLG and study aspects of NLG that are particularly challenging, i.a.

  • how to construct NLG systems that can integrate various types of input representations: textual input, structured content or visual information
  • how to construct NLG systems that can integrate various types of input representations: textual input, structured content or visual information
  • how to ensure faithfulness of the generated text to the input and how to measure this
  • how to arrange content for longer texts (so-called text planning strategies), which typically involves a decision about appropriate portions of content and how to arrange them.
  • how to ensure coherence when generating longer texts, by choosing appropriate linguistic forms (e.g. references for entities or discourse markers), to produce non-redundant and fluent texts that sound natural to humans
  • how to evaluate the quality of the generated texts, including novel metrics that aim to assess the quality and diversity of generated texts, as well as its semantic coherence, including potential contradictions

Module Overview

Agenda

Date Session Materials

Literature

  • Shashi Narayan & Claire Gardent (2020): Deep Learning Approaches to Text Production. Synthesis Lectures on Human Language Technologies, Morgan & Claypool. Available in HEIDI.
  • Albert Gatt and Emiel Krahmer (2018): Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research 61, pp. 65-170.
  • Sai et al. 2020: Survey of Evaluation Metrics Used for NLG Systems, arXiv.
  • Ehud Reiter and Robert Dale (1997): Building applied natural language generation systems. Journal of Natural Language Engineering: 3 (1).

More literature will be provided by the beginning of the term.

» More Materials

zum Seitenanfang