Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Lehrveranstaltungen
heiCO
Ressourcen	Fachschaft
Studien-FAQ	Technik-FAQ

Stochastic Learning

Kursbeschreibung

Studiengang	Modulkürzel	Leistungs- bewertung
BA-2010	AS-CL	8 LP
Master	SS-CL, SS-TAC	8 LP

Dozenten/-innen	Artem Sokolov, Stefan Riezler
Veranstaltungsart	Hauptseminar (nach Rücksprache auch als Forschungsmodul)
Erster Termin	25.10.2016
Zeit und Ort	Di, 11:15–12:45, INF 327 / SR 4 (SR)
Commitment-Frist	tbd.

Teilnahmevoraussetzungen

Master : Grundlagen der Wahrscheinlichkeitstheorie, Statistik und Linearen Algebra
Bachelor : Erfolgreicher Abschluss der Kurse "Formal Foundations of Computational Linguistics: Mathematical Foundations " und "Statistical Methods for Computational Linguistics"

Leistungsnachweis

Aktive und regelmässige Teilnahme
Referat inklusive Vorbereitung von Diskussionsfragen
Implementierungsprojekt oder Abschlussarbeit

Inhalt

Stochastic learning algorithms are the optimization techniques of choice for machine learning problems in high-dimensional parameter spaces over large numbers of training examples. In each iteration, they require only computing the gradient of a single example instead of an average over all n examples as in batch training. Because of this n-fold computational advantage, they are used widely to train neural network models or latent variable models common in natural language processing. One disadvantage of stochastic learning is that the randomness introduces variance that can lead to slower convergence. Furthermore, theoretical guarantees on convergence rates are mostly restricted to convex objectives, while most practical applications are based on non-linear non-convex models.

In this seminar, we will focus on stochastic optimization for non-convex objectives, with the aim to achieve an understanding that goes beyond recipes for hyperparameter tuning.

Possible topics are:

basic first-order stochastic optimization
accelerated gradient techniques
momentum techniques
variance reduction techniques
dual coordinate ascent methods
constant/adaptive learning rates
application to non-convex optimization

Plan

date	topic	presenter
25.10.	Introduction & Overview	Stefan Riezler
8.11.	Bottou (2004). Stochastic Learning.	Reading group
15.11.	Nemirovski et al. (2009). Robust Stochastic Approximation Approach to Stochastic Programming.	Reading group
22.11.	Ghadimi & Lan (2012). Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming.	Maximillian Lappé
29.11.	Le Roux et al. (2012). A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets Defazio et al. (2014). SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives.	Mayumi Ohta
13.12.	Johnson & Zhang (2013). Accelerating Stochastic Gradient Descent using Predictive Variance Reduction Reddi et al. (2016). Stochastic Variance Reduction for Nonconvex Optimization.	Juri Opitz
20.12.	Wang et al. (2013). Variance Reduction for Stochastic Gradient Optimization. Kocsis & Szepesvari (2006). Universal Parameter Optimisation in Games Based on SPSA.	Julian Hitschler
10.1.	Shalev-Shwartz & Zhang (2013). Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization. Shalev-Shwartz (2015). SDCA without Duality.	Julius Steen
17.1.	Byrd et al. (2012). Sample Size Selection in Optimization Methods for Machine Learning. Keskar et al. (2016). On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.	Daniel Kienitz
24.1.	Sutskever et al. (2013). On the Importance of Initialization and Momentum in Deep Learning. Polyak (1964). Some Methods of Speeding Up the Convergence of Iteration Methods.	Stoyan Dimitrov
31.1.	Nesterov (1983). A Method of Solving a Convex Programming Problem with Convergence Rate O(1/k^2). Ghadimi & Lan (2015). Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming.	Sebastian Burst
7.2.	Bottou et al. (2016). Optimization Methods for Large-Scale Machine Learning.	Reading group recap