The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (SGD) method and has found numerous applications in Machine Learning. However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e. when only small or bounded batch sizes are used). In this paper, we focus on the stochastic proximal gradient method with Polyak momentum. We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size. Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results.
翻译:随机近端梯度法是广泛使用的随机梯度下降法(SGD)的有力推广,并在机器学习领域获得了众多应用。然而,该方法在随机噪声显著(即使用小批量或有限批量时)的非凸设定下无法收敛,这是众所周知的难题。本文研究采用Polyak动量的随机近端梯度法。我们证明,无论批量大小如何,该方法均能在非凸复合优化问题中达到最优收敛速率。此外,我们严格分析了Polyak动量在复合优化设定下的方差缩减效应,并表明当近端步骤只能非精确求解时该方法仍能收敛。最后,我们通过数值实验验证了理论结果。