
Stochastic Learning
Kursbeschreibung
Studiengang | Modulkürzel | Leistungs- bewertung |
---|---|---|
BA-2010 | AS-CL | 8 LP |
Master | SS-CL, SS-TAC | 8 LP |
Dozenten/-innen | Artem Sokolov, Stefan Riezler |
Veranstaltungsart | Hauptseminar (nach Rücksprache auch als Forschungsmodul) |
Erster Termin | 25.10.2016 |
Zeit und Ort | Di, 11:15–12:45, INF 327 / SR 4 (SR) |
Commitment-Frist | tbd. |
Teilnahmevoraussetzungen
- Master : Grundlagen der Wahrscheinlichkeitstheorie, Statistik und Linearen Algebra
- Bachelor : Erfolgreicher Abschluss der Kurse "Formal Foundations of Computational Linguistics: Mathematical Foundations " und "Statistical Methods for Computational Linguistics"
Leistungsnachweis
- Aktive und regelmässige Teilnahme
- Referat inklusive Vorbereitung von Diskussionsfragen
- Implementierungsprojekt oder Abschlussarbeit
Inhalt
Stochastic learning algorithms are the optimization techniques of choice for machine learning problems in high-dimensional parameter spaces over large numbers of training examples. In each iteration, they require only computing the gradient of a single example instead of an average over all n examples as in batch training. Because of this n-fold computational advantage, they are used widely to train neural network models or latent variable models common in natural language processing. One disadvantage of stochastic learning is that the randomness introduces variance that can lead to slower convergence. Furthermore, theoretical guarantees on convergence rates are mostly restricted to convex objectives, while most practical applications are based on non-linear non-convex models.
In this seminar, we will focus on stochastic optimization for non-convex objectives, with the aim to achieve an understanding that goes beyond recipes for hyperparameter tuning.
Possible topics are:
- basic first-order stochastic optimization
- accelerated gradient techniques
- momentum techniques
- variance reduction techniques
- dual coordinate ascent methods
- constant/adaptive learning rates
- application to non-convex optimization