Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Human Reinforcement Learning: Algorithms and Hands-on Practice

Module Description

Course Module Abbreviation Credit Points
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Lecturer Stefan Riezler, Sariya Karimova
Module Type Hauptseminar / Übung
Language English
First Session 23.04.2019
Time and Place Tuesday, 11:15-12:45, INF 327 / SR 4 Thursday, 14:15-15:45, INF 327 / SR 4 Interactive sessions will take place in the IWR Pool 1 (Mathematikon, 3rd floor).
Commitment Period tbd.

Prerequisite for Participation


Expertise comparable to course "Neural Networks: Architectures and Applications for NLP" (


Successful completion of courses "Programmieren I (Python)", "Formal Foundations of Computational Linguistics: Mathematical Foundations" and "Statistical Methods for Computational Linguistics"; ideally also "Neural Networks: Architectures and Applications for NLP"


  • Presentation of one paper from seminar reading list
  • Reading of rest of papers from seminar reading list
  • Successful completion of exercises
  • Regular and active attendance of seminar and exercises
  • Implementation project or written term paper


Reinforcement learning (RL) is a machine learning technique that is
placed between supervised and unsupervised learning. Instead of learning
from explicit supervision by ground-truth examples, an input-output
relation is learned through interaction of a system with the environment
or user. Learning from implicit feedback such as rewards that evaluate
the quality of predicted outputs is less costly than explicit
supervision and allows to learn in uncharted territory. The greatest
success stories of RL have been achieved in areas where reward signals
are well-defined and abundant, e.g., in learning to play games at
superhuman performance. An application of RL to interactions with human
users is a harder problem because human rewards are inconsistent and
sparse. This requires to solve the additional problem of learning good
reward estimators from human feedback.

The goal of this class to provide the student with a thorough
understanding of the central theoretical and algorithmical concepts of
RL, including techniques to analyze and learn from human rewards. The
class is accompanied by implementation exercises for RL algorithms and
by practical human RL sessions for interactive neural machine translation.

The seminar starts with a short series of lectures that introduces the
central concepts of RL. These concepts include:
- Markov Decision Processes
- Policy Evaluation and Policy Optimization
- Dynamic Programming Methods and the Bellman Equation
- Monte Carlo Methods: TD-Learning, Q-Learning
- Policy Gradient Methods: REINFORCE, Actor-Critic Methods

In the sessions after the introductory lectures, students will present
and discuss recent research papers in RL. A seminar reading list will be
made available at the beginning of the course.

The exercise part of the course will focus on practical exercises
related to the RL algorithms introduced in the seminar, and on hands-on
sessions where human feedback is given to a neural machine translation
that learns from corrections and rewards.


Sutton & Barto (2017). Reinforcement Learning. An Introduction. MIT Press. ( )

» More Materials

zum Seitenanfang