Ruprecht-Karls-Universität Heidelberg
Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Stochastic Learning


Studiengang Modulkürzel Leistungs-
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Dozenten/-innen Artem Sokolov, Stefan Riezler
Veranstaltungsart Hauptseminar (nach Rücksprache auch als Forschungsmodul)
Erster Termin 25.10.2016
Zeit und Ort Di, 11:1512:45, INF 327 / SR 4 (SR)
Commitment-Frist tbd.


  • Master : Grundlagen der Wahrscheinlichkeitstheorie, Statistik und Linearen Algebra
  • Bachelor : Erfolgreicher Abschluss der Kurse "Formal Foundations of Computational Linguistics: Mathematical Foundations " und "Statistical Methods for Computational Linguistics"


  • Aktive und regelmässige Teilnahme
  • Referat inklusive Vorbereitung von Diskussionsfragen
  • Implementierungsprojekt oder Abschlussarbeit


Stochastic learning algorithms are the optimization techniques of choice for machine learning problems in high-dimensional parameter spaces over large numbers of training examples. In each iteration, they require only computing the gradient of a single example instead of an average over all n examples as in batch training. Because of this n-fold computational advantage, they are used widely to train neural network models or latent variable models common in natural language processing. One disadvantage of stochastic learning is that the randomness introduces variance that can lead to slower convergence. Furthermore, theoretical guarantees on convergence rates are mostly restricted to convex objectives, while most practical applications are based on non-linear non-convex models.

In this seminar, we will focus on stochastic optimization for non-convex objectives, with the aim to achieve an understanding that goes beyond recipes for hyperparameter tuning.

Possible topics are:

  • basic first-order stochastic optimization
  • accelerated gradient techniques
  • momentum techniques
  • variance reduction techniques
  • dual coordinate ascent methods
  • constant/adaptive learning rates
  • application to non-convex optimization


date topic presenter
25.10. Introduction & Overview Stefan Riezler
8.11. Bottou (2004). Stochastic Learning. Reading group
15.11. Nemirovski et al. (2009). Robust Stochastic Approximation Approach to Stochastic Programming. Reading group
22.11. Ghadimi & Lan (2012). Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming. Maximillian Lappé
29.11. Le Roux et al. (2012). A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
Defazio et al. (2014). SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives.
Mayumi Ohta
13.12. Johnson & Zhang (2013). Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
Reddi et al. (2016). Stochastic Variance Reduction for Nonconvex Optimization.
Juri Opitz
20.12. Wang et al. (2013). Variance Reduction for Stochastic Gradient Optimization.
Kocsis & Szepesvari (2006). Universal Parameter Optimisation in Games Based on SPSA.
Julian Hitschler
10.1. Shalev-Shwartz & Zhang (2013). Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization.
Shalev-Shwartz (2015). SDCA without Duality.
Julius Steen
17.1. Byrd et al. (2012). Sample Size Selection in Optimization Methods for Machine Learning.
Keskar et al. (2016). On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.
Daniel Kienitz
24.1. Sutskever et al. (2013). On the Importance of Initialization and Momentum in Deep Learning.
Polyak (1964). Some Methods of Speeding Up the Convergence of Iteration Methods.
Stoyan Dimitrov
31.1. Nesterov (1983). A Method of Solving a Convex Programming Problem with Convergence Rate O(1/k^2).
Ghadimi & Lan (2015). Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming.
Sebastian Burst
7.2. Bottou et al. (2016). Optimization Methods for Large-Scale Machine Learning. Reading group recap

» weitere Kursmaterialien

zum Seitenanfang