Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Deep Learning in Speech-to-Text Translation

Module Description

Course Module Abbreviation Credit Points
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010[25%] BS-AC 4 LP
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Lecturer Tsz Kin Lam
Module Type Proseminar / Hauptseminar
Language English
First Session 10.11.2020


Time and Place Tuesday, 16:15-17:45, Online (Heiconf)
Commitment Period tbd.

Prerequisite for Participation

Good knowledge of statistical machine learning (e.g., by successful completion of courses ”Statistical Methods for Computational Linguistics” and/or ”Neural Networks: Architectures and Applications for NLP”) and experience in experimental work (e.g., software project or seminar implementation project) and basic knowledge of Sequence-To-Sequence Learning.


  • Regular and active participation: reading research papers and asking questions in class
  • Oral presentation of a selected paper
  • Implementation project


Automatic Speech translation (AST) is the task of translating acoustic speech signals into text in a foreign language. Such systems have a large variety of applications: they are part of travel assistants, they provide simultaneous lecture translation, or they automatically generate subtitles for foreign movies or videos. In crisis response or developmental assistance, such systems can play a central role when foreign helpers need to communicate with locals. And, in the context of intercultural understanding, these systems allow us to have conversations with people we would otherwise never talk to because of the language barrier. AST thus is one of current biggest challenges and greatest promises in artificial intelligence.

This is a seminar about sequence-to-sequence learning for audio and text data with a focus on neural speech-to-text translation. Participants will learn about the current status of ST, especially the different aspects and challenges of conventional cascaded systems, i.e. Automatic Speech Recognition (ASR) combined with Machine Translation (MT) systems, and novel end-to-end speech translation systems that do not directly rely on ASR and MT components. In this seminar, we will discuss the rise of neural architectures in the field, but we will also take a closer look on recent advances in developing better data representations for the ST task.

Topics including (but not limited to)

  • Speech Features: learning from raw waveforms (e.g. SINCNET)
  • Speech Embeddings (unsupervised representation learning)
  • Simultaneous Speech Translation
  • Speech Enhancement
  • Module Overview

    Notes and updates

    (1) Present two papers or 1 paper + 1 project (2) Heiconf link and Org. slides added


    Date Session Materials
    17.11 Org. + 1 tutorial Org and slides by Stefan Riezler

    » More Materials

    zum Seitenanfang