Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Reinforcement Learning


Studiengang Modulkürzel Leistungs-
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Dozenten/-innen Stefan Riezler
Veranstaltungsart Hauptseminar
Erster Termin 24.10.2017
Zeit und Ort Di, 11:1512:45, INF 325 / SR 24 (SR)
Commitment-Frist tbd.


Master: Grundlagen der Wahrscheinlichkeitstheorie, Statistik und Linearen Algebra
Bachelor: Erfolgreicher Abschluss der Kurse "Formal Foundations of Computational Linguistics: Mathematical Foundations " und "Statistical Methods for Computational Linguistics"


  • Aktive und regelmässige Teilnahme
  • Referat inklusive Vorbereitung von Diskussionsfragen
  • Implementierungsprojekt oder Abschlussarbeit


Reinforcement learning is a machine learning technique that is placed between supervised and unsupervised learning. Instead of learning from explicit supervision by ground-truth examples, an input-output relation is learned trough interaction of a system with the environment or user. Learning from implicit feedback such as rewards that evaluate the quality of predicted outputs is less costly than explicit supervision and allows to learn in uncharted territory. The goal of this class is to introduce into the central theoretical and algorithmical concepts of reinforcement learning, with a special focus on applications to structured predcition problems in natural language processing.

Possible topis of the class are:

  • Markov Decision Processes vs. Multi-Armed Bandits
  • Exploration vs. Exploitation
  • Prediction vs. Control
  • Dynamic Programming vs. Monte Carlo vs. Temporal Difference Learning
  • Critic-Only vs. Actor-Only vs. Actor-Critic Algorithms



Datum Thema Referent
24.10. Orga Riezler
7.11. Introduction to Reinforcement Learning Riezler
14.11. Introduction to RL, contd. Riezler
21.11. Introduction to RL, contd. Riezler
28.11. Watkins & Dayan (1992). Q-Learning.
Mnih et al. (2015). Human-level control through deep reinforcement learning.
Niels Bernlöhr
5.12. Mannor et al. (2003). The Cross Entropy method for Fast Policy Search.
Salimans et al. (2017). Evolution Strategies as a Scalabe Alternative to Reinforcement Learning
Sassan Mokhtar
12.12. Williams (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning.
Sutton et al. (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation.
Simon Will & Max Lappé
19.12. Konda & Tsitsiklis (2000). Actor-Critic Algorithms.
Mnih et al. (2016). Asynchronous Methods for Deep Reinforcement Learning.
Lennard Kiehl
9.1. Greensmith et al. (2004). Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning.
Wang et al. (2013). Variance Reduction for Stochastic Gradient Optimization.
Tatjana Chernenko & Max Bacher
16.1. Kakade (2002). A Natural Policy Gradient.
Schulman et al. (2015). Trust Region Policy Optimization.
Shiyue Zhang
23.1. Asadi et al. (2017). Mean Actor Critic.
Ciosek & Whiteson (2017). Expected Policy Gradients.
Neha Pandey
30.1. Kreutzer et al. (2017). Bandit Structured Prediction for Neural Sequence-to-Sequence Learning.
Nguyen et al. (2017). Reinforcement Learning for Bandit Neural Machine Translation wiht Simulated Human Feedback.
Enrique Fita Sanmartin
6.2. Christiano et al. (2017). Deep Reinforcement Learning from Human Preferences.
Judah et al. (2010). Reinforcement Learning via Practice and Critique Advice.
Nadja Heinzen & Arthur Neidlein


  • Ebook link for Sutton & Barto (2017). Reinforcement Learning. An Introduction. MIT Press.
  • Ebook link for Szepesvari (2010). Algorithms for Reinforcement Learning. Morgan & Claypool.

» weitere Kursmaterialien

zum Seitenanfang