Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Lehrveranstaltungen
heiCO
Ressourcen	Fachschaft
Studien-FAQ	Technik-FAQ

Human Reinforcement Learning: Algorithms and Hands-on Practice

Module Description

Course	Module Abbreviation	Credit Points
BA-2010	AS-CL	8 LP
Master	SS-CL, SS-TAC	8 LP

Lecturer	Stefan Riezler, Sariya Karimova
Module Type	Hauptseminar / Übung
Language	English
First Session	23.04.2019
Time and Place	Tuesday, 11:15-12:45, INF 327 / SR 4 Thursday, 14:15-15:45, INF 327 / SR 4 Interactive sessions will take place in the IWR Pool 1 (Mathematikon, 3rd floor).
Commitment Period	tbd.

Prerequisite for Participation

Master:

Expertise comparable to course "Neural Networks: Architectures and Applications for NLP" (https://www.cl.uni-heidelberg.de/courses/ws18/neuralnetworks/)

Bachelor:

Successful completion of courses "Programmieren I (Python)", "Formal Foundations of Computational Linguistics: Mathematical Foundations" and "Statistical Methods for Computational Linguistics"; ideally also "Neural Networks: Architectures and Applications for NLP"

Assessment

Presentation of one paper from seminar reading list
Reading of rest of papers from seminar reading list
Successful completion of exercises
Regular and active attendance of seminar and exercises
Implementation project or written term paper

Content

Reinforcement learning (RL) is a machine learning technique that is
placed between supervised and unsupervised learning. Instead of learning
from explicit supervision by ground-truth examples, an input-output
relation is learned through interaction of a system with the environment
or user. Learning from implicit feedback such as rewards that evaluate
the quality of predicted outputs is less costly than explicit
supervision and allows to learn in uncharted territory. The greatest
success stories of RL have been achieved in areas where reward signals
are well-defined and abundant, e.g., in learning to play games at
superhuman performance. An application of RL to interactions with human
users is a harder problem because human rewards are inconsistent and
sparse. This requires to solve the additional problem of learning good
reward estimators from human feedback.

The goal of this class to provide the student with a thorough
understanding of the central theoretical and algorithmical concepts of
RL, including techniques to analyze and learn from human rewards. The
class is accompanied by implementation exercises for RL algorithms and
by practical human RL sessions for interactive neural machine translation.

The seminar starts with a short series of lectures that introduces the
central concepts of RL. These concepts include:
- Markov Decision Processes
- Policy Evaluation and Policy Optimization
- Dynamic Programming Methods and the Bellman Equation
- Monte Carlo Methods: TD-Learning, Q-Learning
- Policy Gradient Methods: REINFORCE, Actor-Critic Methods

In the sessions after the introductory lectures, students will present
and discuss recent research papers in RL. A seminar reading list will be
made available at the beginning of the course.

The exercise part of the course will focus on practical exercises
related to the RL algorithms introduced in the seminar, and on hands-on
sessions where human feedback is given to a neural machine translation
that learns from corrections and rewards.

Literature