In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice of the stepsize but a proper variance reduced version is missing. In this work, we propose the first study of variance reduction techniques for stochastic proximal point algorithms. We introduce a stochastic proximal version of SVRG, SAGA, and some of their variants for smooth and convex functions. We provide several convergence results for the iterates and the objective function values. In addition, under the Polyak-{\L}ojasiewicz (PL) condition, we obtain linear convergence rates for the iterates and the function values. Our numerical experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts, especially about the stability with respect to the choice of the step size.
翻译:在有限和最小化问题中,方差缩减技术被广泛用于提升当前最优随机梯度方法的性能。这些技术不仅在理论性质上具有显著优势,其实际影响也十分明确。随机近端点算法作为随机梯度算法的替代方案一直被研究,因其对步长选择具有更好的稳定性,但目前尚缺乏恰当的方差缩减版本。本文首次系统研究了随机近端点算法中的方差缩减技术,针对光滑凸函数提出了SVRG、SAGA及其若干变体的随机近端点版本。我们给出了针对迭代序列与目标函数值的多种收敛性结果,并在Polyak-{\L}ojasiewicz (PL)条件下获得了迭代与函数值的线性收敛速率。数值实验表明,近端点方差缩减方法相较于梯度型方法具有明显优势,尤其在步长选择的稳定性方面表现突出。