We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.
翻译:本文研究一类随机光滑凸优化问题,在随机梯度观测噪声的一般性假设下展开分析。与经典问题设置中假设噪声方差一致有界不同,本文假设随机梯度方差与算法生成的近似解“次优性”相关。此类问题自然出现在多种应用场景中,尤其是统计学中著名的广义线性回归问题。然而,据我们所知,现有解决此类问题的随机逼近算法中,尚未有方法能在精度、问题参数和小批量尺寸的依赖性上达到最优性。我们提出了两种非欧几里得加速随机逼近方法——随机加速梯度下降(SAGD)与随机梯度外推(SGE)——二者具有特定的对偶关系。研究表明,在适当条件下,SAGD和SGE均能达到最优收敛速率,同时实现最优迭代复杂度与样本复杂度。然而,SGE算法的相应假设更具一般性,例如允许在重尾噪声和不连续得分函数下高效应用于统计估计问题。我们还讨论了SGE在满足二次增长条件问题中的应用,并展示了如何利用其恢复稀疏解。最后,我们报告了部分模拟实验,以说明所提算法在高维场景中的数值性能。