We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains. We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. We also establish this convergence in Wasserstein-2 distance in a more general setting compared to previous work. Thanks to the invariance property of the limit distribution, our analysis shows that the latter inherits sub-Gaussian or sub-exponential concentration properties when these hold true for the gradient. This allows the derivation of high-confidence bounds for the final estimate. Finally, under such conditions in the linear case, we obtain a dimension-free deviation bound for the Polyak-Ruppert average of a tail sequence. All our results are non-asymptotic and their consequences are discussed through a few applications.
翻译:我们考虑利用恒定步长随机梯度下降法(SGD)优化光滑且强凸目标函数,并通过马尔可夫链视角研究其性质。我们证明,对于具有适度可控方差的无偏梯度估计,迭代过程在总变差距离下收敛至不变分布。相较于已有研究,我们还在更一般设定下建立了Wasserstein-2距离下的收敛性。基于极限分布的不变性,我们的分析表明:当梯度本身具有次高斯或次指数集中性质时,极限分布将继承这些性质,从而可推导最终估计的高置信度界。最后,在线性情形下,我们为Polyak-Ruppert尾部平均序列给出了无维数依赖的偏差界。所有结果均为非渐近性质,并通过若干应用实例讨论其含义。