Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Speech Recognition and Speech Translation


Studiengang Modulkürzel Leistungs-
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Dozenten/-innen Stefan Riezler und Sariya Karimova
Veranstaltungsart Hauptseminar/Übung
Erster Termin 24.04.2018
Zeit und Ort Di, 11:1512:45, INF 327 / SR 3 (SR)
  Do, 16:1517:45, INF 326 / SR 28 (SR)
Commitment-Frist tbd.


Master: Grundlagen der Wahrscheinlichkeitstheorie und Statistik
Bachelor: Erfolgreicher Abschluss der Kurse "Formal Foundations of Computational Linguistics: Mathematical Foundations " und "Statistical Methods for Computational Linguistics"


- Regelmässige Teilnahme an Seminar und Übung

- Bearbeitung der Übungsaufgaben

- Referat inklusive Vorbereitung von Diskussionsfragen

- Hausarbeit und/oder Implementierungsprojekt


Automatic speech recognition (ASR) and machine translation (MT) are among the hardest problems in NLP, yet they belong to the few success stories in our area, due to the availability of large amounts of training data "in the wild". Furthermore, ASR and MT were among the first applications where deep learning methodology could be shown to be beneficial. The combination of large amounts of real-world data and sophisticated machine learning technology makes both topics interesting research problems.

The seminar will start with introductory lectures to both topics, with the goal to prepare for an even harder problem - automatic translation of speech input - and its specific challenges. These include disfluencies in speech input, error propagation in pipelines that combine ASR with MT, or the challenge of translating speech directly from acoustic signals.

Possible topics of the seminar include
- basics of phonetics and acoustic models
- basics of automatic speech recognition
- basics of neural machine translation
- in-depth readings of research papers on speech translation
- practical exercises for all discussed topics


Jurafsky & Martin (2008). Speech and Language Processing. Prentice Hall.

Holmes & Holmes (2001). Speech Synthesis and Recognition. Taylor & Francis.

Ladefoged (2006). Elements of Acoustic Phonetics. University of Chicago Press.

Rabiner and Schafer (2007). Introduction to Digital Speech Processing. now publications.

Goldberg (2015). A Primer on Neural Network Models for Natural Language Processing.

Cho (2015). Natural Language Understanding with Distributed Representation.

Neubig (2017). Neural Machine Translation and Sequence-to-sequence Models: A Tutorial.

» weitere Kursmaterialien

zum Seitenanfang