We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.
翻译:我们考虑一类随机光滑凸优化问题,对随机梯度观测中的噪声采用较为一般的假设。与经典问题设定中噪声方差一致有界不同,本文假设随机梯度的方差与算法所得近似解的"次优性"相关。这类问题自然出现在多种实际应用中,特别是统计学中著名的广义线性回归问题。然而,据我们所知,现有求解此类问题的随机逼近算法均未能在精度、问题参数和小批量规模的依赖关系上达到最优性。我们讨论两种非欧几里得加速随机逼近程序——随机加速梯度下降(SAGD)和随机梯度外推(SGE)——它们具有特定的对偶关系。我们证明,在适当条件下,SAGD和SGE均能同时达到最优迭代复杂度和样本复杂度的最优收敛速率。但SGE算法的相应假设更为一般;例如,它们允许在重尾噪声和不连续得分函数下将SGE高效应用于统计估计问题。我们还讨论了SGE在满足二次增长条件问题中的应用,并展示了如何利用其恢复稀疏解。最后,我们报告了一些仿真实验,以说明所提算法在高维场景中的数值性能。