Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Lehrveranstaltungen
heiCO
Ressourcen	Fachschaft
Studien-FAQ	Technik-FAQ

Reinforcement Learning

Kursbeschreibung

Studiengang	Modulkürzel	Leistungs- bewertung
BA-2010	AS-CL	8 LP
Master	SS-CL, SS-TAC	8 LP

Dozenten/-innen	Stefan Riezler
Veranstaltungsart	Hauptseminar
Erster Termin	24.10.2017
Zeit und Ort	Di, 11:15–12:45, INF 325 / SR 24 (SR)
Commitment-Frist	tbd.

Teilnahmevoraussetzungen

Master: Grundlagen der Wahrscheinlichkeitstheorie, Statistik und Linearen Algebra
Bachelor: Erfolgreicher Abschluss der Kurse "Formal Foundations of Computational Linguistics: Mathematical Foundations " und "Statistical Methods for Computational Linguistics"

Leistungsnachweis

Aktive und regelmässige Teilnahme
Referat inklusive Vorbereitung von Diskussionsfragen
Implementierungsprojekt oder Abschlussarbeit

Inhalt/Contents

Reinforcement learning is a machine learning technique that is placed between supervised and unsupervised learning. Instead of learning from explicit supervision by ground-truth examples, an input-output relation is learned trough interaction of a system with the environment or user. Learning from implicit feedback such as rewards that evaluate the quality of predicted outputs is less costly than explicit supervision and allows to learn in uncharted territory. The goal of this class is to introduce into the central theoretical and algorithmical concepts of reinforcement learning, with a special focus on applications to structured predcition problems in natural language processing.

Possible topis of the class are:

Markov Decision Processes vs. Multi-Armed Bandits
Exploration vs. Exploitation
Prediction vs. Control
Dynamic Programming vs. Monte Carlo vs. Temporal Difference Learning
Critic-Only vs. Actor-Only vs. Actor-Critic Algorithms

Kursübersicht

Seminarplan

Datum	Thema	Referent
24.10.	Orga	Riezler
7.11.	Introduction to Reinforcement Learning	Riezler
14.11.	Introduction to RL, contd.	Riezler
21.11.	Introduction to RL, contd.	Riezler
28.11.	Watkins & Dayan (1992). Q-Learning. Mnih et al. (2015). Human-level control through deep reinforcement learning.	Niels Bernlöhr
5.12.	Mannor et al. (2003). The Cross Entropy method for Fast Policy Search. Salimans et al. (2017). Evolution Strategies as a Scalabe Alternative to Reinforcement Learning	Sassan Mokhtar
12.12.	Williams (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Sutton et al. (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation.	Simon Will & Max Lappé
19.12.	Konda & Tsitsiklis (2000). Actor-Critic Algorithms. Mnih et al. (2016). Asynchronous Methods for Deep Reinforcement Learning.	Lennard Kiehl
9.1.	Greensmith et al. (2004). Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning. Wang et al. (2013). Variance Reduction for Stochastic Gradient Optimization.	Tatjana Chernenko & Max Bacher
16.1.	Kakade (2002). A Natural Policy Gradient. Schulman et al. (2015). Trust Region Policy Optimization.	Shiyue Zhang
23.1.	Asadi et al. (2017). Mean Actor Critic. Ciosek & Whiteson (2017). Expected Policy Gradients.	Neha Pandey
30.1.	Kreutzer et al. (2017). Bandit Structured Prediction for Neural Sequence-to-Sequence Learning. Nguyen et al. (2017). Reinforcement Learning for Bandit Neural Machine Translation wiht Simulated Human Feedback.	Enrique Fita Sanmartin
6.2.	Christiano et al. (2017). Deep Reinforcement Learning from Human Preferences. Judah et al. (2010). Reinforcement Learning via Practice and Critique Advice.	Nadja Heinzen & Arthur Neidlein

Literatur

Ebook link for Sutton & Barto (2017). Reinforcement Learning. An Introduction. MIT Press.
Ebook link for Szepesvari (2010). Algorithms for Reinforcement Learning. Morgan & Claypool.