SAGA with Perturbations

Due to their implementational simplicity and excellent performance, variance reduction
techniques of Stochastic Gradient Descent (SGD) have gained popularity in a wide range
of optimization problems. Yet, theoretical guarantees for non-convex objectives are still
lacking. Since most state-of-the-art methods demand non-convex optimization, the lack
of foundation makes them less applicable in the machine learning main stream. Another
difficulty in convergence analysis arise due to noisy estimation. Despite its practical
success, the theoretical background of the noise-induced tricks remains unclear.
In this thesis, we analyze the convergence behavior of ρ-SAGA; a perturbed iterative
first-order optimization algorithm, both under convex and non-convex assumption
separately. Our proofs show that variance-reduced perturbed objectives converge to a
stationary point in linear rate under the strong convexity assumption, in sublinear rate
under the non-strong convexity assumption. In the non-convex setting, ρ-SAGA reaches
O(1/e) iteration complexity compared to O((1/e)^2) for SGD.
In addition to the theoretical analysis, we also present an empirical evaluation on stan-
dard multiclass classification benchmarks. We designed our tasks in a bandit learning
framework, which requires a non-convex expected loss minimization objective and per-
turbations on its output space. The experimental results show that ρ-SAGA outperforms
against ρ-SGD baseline in the bandit scenario.